Wednesday, 7 March 2012

Confusion between 2 queries

Hi

I was hoping that someone will be able to explain to me what the difference is between these two queries, because i get different results when i run them.

I'm trying to count the distinct unique email addresses

Code Snippet

SELECT DISTINCT Prefix_code, Email

FROM tmpmedschemereal

WHERE email like '%@.%'

AND prefix_Code = 'UNW'

GROUP BY prefix_Code, email

return 1939 rows

Code Snippet

SELECT DISTINCT Prefix_code, COUNT(Email)

FROM tmpmedschemereal

WHERE email like '%@.%'

AND prefix_Code = 'UNW'

GROUP BY prefix_Code, email

returns 2006 rows

Why is there a difference, as far as i know they are suppose to return the same amount of rows, but there must be something that i misunderstand.

Will someone please be able to clarify for me what it is?

Carel:

This has to do with the basic function of the COUNT aggregate; for example:

Code Snippet

declare @.aTable table
( prefix_code varchar(5),
email varchar(20)
)
insert into @.aTable
select '$5', 'jax@.jax.com' union all
select 'UNW', 'jax@.jax.com' union all
select 'UNW', 'jim@.jax.com'

select distinct prefix_code, email
from @.aTable
group by prefix_code, email

/*
prefix_code email
-- --
$5 jax@.jax.com
UNW jax@.jax.com
UNW jim@.jax.com
*/


select distinct prefix_code, count(email)
from @.aTable
group by prefix_code, email

/*
prefix_code
-- --
$5 1
UNW 1
*/

Note that BOTH the jax@.jax.com and jim@.jax.com would produce a line with PREFIX_CODE = 'UNW' and COUNT = 1; however, because of the DISTINCT clause and because these two rows are identical in the second query, one of the rows is eliminated.

Not so in the first query. Because the attribute is listed instead of the count, each row remains unique and neither row is dropped.

|||

Hi Carel,

The

SELECT DISTINCT Prefix_Code, email

is eliminating duplicates in the resultset, whereas with the

SELECT DISTINCT Prefix_Code, count( Email),

duplicates are being counted.

Here is an example to illustrate the issue:

SET NOCOUNT ON

DECLARE @.MyTable table
( RowID int IDENTITY,
MyName varchar(20),
MyEmail varchar(50)
)

INSERT INTO @.MyTable VALUES ( 'Bill', 'Bill@.MyDomain.com' )
INSERT INTO @.MyTable VALUES ( 'Will', 'Will@.MyDomain.com' )
INSERT INTO @.MyTable VALUES ( 'Jill', 'Jill@.MyDomain.com' )
INSERT INTO @.MyTable VALUES ( 'Bill', 'Bill@.MyDomain.com' )
INSERT INTO @.MyTable VALUES ( 'Bill', 'Bill@.MyDomain.com' )

SELECT
MyName,
MyEmail
FROM @.MyTable
GROUP BY
MyName,
MyEmail

/*
MyName MyEmail
-- --
Bill Bill@.MyDomain.com
Jill Jill@.MyDomain.com
Will Will@.MyDomain.com
*/

SELECT
MyName,
EmailCount = count( MyEmail )
FROM @.MyTable
GROUP BY
MyName,
MyEmail

/*
MyName EmailCount
-- --
Bill 3
Jill 1
Will 1
*/

Also, you don't need to use DISTINCT when you use GROUP BY -they do the same thing.|||

And the 'truth' is, as always, somewhere in between the seemingly contrary, but accurate, explanations.

Kent and I have approached your question from two different perspectives, answering two slight variations of the question you posed. Both are accurate.

|||

Well said, Arnie; admiring the work.

|||

Thanks guys

I was just extremely confused as to why it did that.

I wish i could mark you both right.

Thanks for all the help, yet again.

|||

I thought you could mark both answers correct; in any case, I gave Arnie credit as you indicated.

( I know how it is Carel; Arnie helps me out a lot too. )

( And you are welcome, Carel )

|||

Okay, thats cool.

Thanks Kent.

Makes me feel better, because Arnie helps me out a lot.

No comments:

Post a Comment