Wednesday, 7 March 2012

Confused over query cost of execution plan

I am a bit with the query cost information displayed by the
execution plan.
First request:
I am using a function to get the required data. The implementation is quite
complex so I am not posting it now. I hope it suffices to mention that the
function has a primary key defined on two of the columns returned. Now my
query to get the data from the function would take the form:
select * -- normally I have a selected list of columns here.
from fn_name (param1, param2, param3)
Another query which uses the primary key would be in the form:
select *
from fn_name (param1, param2, param3)
where pkcolumn > 0 -- the primary key column is made up of positive
integer values.
When I display the execution plan, the "Query cost relative to the batch"
values are 85 and 15 (rounded) for each query respectively. The first is
doing a Clustered scan and the second a Clustered s. However, when I
compare the run times of each query they both fare almost the same (first
averages 105 ms and second 100 ms).
My understanding was that if both queries took say 200 ms to run, then the
first query would account for 170 ms (85% of the batch) and the second one
would take 30 ms. Where am I mistaken?
Second request:
I execute a stored procedure. The execution plan shows 100 queries. The last
two queries are for two select statements from the inserted and deleted
tables from the trigger of a table being updated. No other actions are taken
by the trigger (because conditions not satisfied). However, the last two
trigger queries account for 84% of the batch (42% each). Running the
procedure when trigger is disabled or enabled does not show any significant
difference in run time (around 2 secs). What lesson have I missed?
Thanks,
RajQuery cost does not just take into consideration the time it takes to run a
query. It also looks at the IO required, to either RAM or Disk. On a heavily
loaded system, the IO cost has a more dramatic effect on performance than
CPU or time cost as the queries might wait for IO operation that are queued.
Regards
--
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
IM: mike@.epprecht.net
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Khooseeraj Moloye" <rkmoloye@.hotmail.com> wrote in message
news:%23vmkYuYXFHA.320@.TK2MSFTNGP10.phx.gbl...
>I am a bit with the query cost information displayed by the
>execution plan.
> First request:
> I am using a function to get the required data. The implementation is
> quite complex so I am not posting it now. I hope it suffices to mention
> that the function has a primary key defined on two of the columns
> returned. Now my query to get the data from the function would take the
> form:
> select * -- normally I have a selected list of columns here.
> from fn_name (param1, param2, param3)
> Another query which uses the primary key would be in the form:
> select *
> from fn_name (param1, param2, param3)
> where pkcolumn > 0 -- the primary key column is made up of positive
> integer values.
> When I display the execution plan, the "Query cost relative to the batch"
> values are 85 and 15 (rounded) for each query respectively. The first is
> doing a Clustered scan and the second a Clustered s. However, when I
> compare the run times of each query they both fare almost the same (first
> averages 105 ms and second 100 ms).
> My understanding was that if both queries took say 200 ms to run, then the
> first query would account for 170 ms (85% of the batch) and the second one
> would take 30 ms. Where am I mistaken?
> Second request:
> I execute a stored procedure. The execution plan shows 100 queries. The
> last two queries are for two select statements from the inserted and
> deleted tables from the trigger of a table being updated. No other actions
> are taken by the trigger (because conditions not satisfied). However, the
> last two trigger queries account for 84% of the batch (42% each). Running
> the procedure when trigger is disabled or enabled does not show any
> significant difference in run time (around 2 secs). What lesson have I
> missed?
> Thanks,
> Raj
>|||Mike,
Thanks for the response. However, I am still left !
It all ought got to go down to time finally, in my humble opinion. That's
the essence of the query optimiser - to save time, isn't it? If it is not,
what IS the query cost really; some sort of value meaningly only to the
optimiser? How to explain that 2 simple select statements in a trigger
outweigh the other 98 complex queries by 84:16? That's where I am perplexed!
There ought to be a reasonable answer for this.
Sorry to have put in more questions here... but some clearer answers are
most welcome.
Cheers,
Raj|||Unfortunately, the cost for the multi-statement table-valued UDF-s is
not taken into account in the execution plan for the calling query.
Instead, it gives you only the cost of executing the query as if the
result of the UDF was a table (with no cost of its own).
If your UDF is performing only a single statement, you may want to
change it to an inline table-valued UDF, which is taken into acount in
the execution plan of the calling query (as if the UDF was a view).
If your UDF is performing more statements and you want to analyze the
execution plan for those statements, you must copy the statements in
Query Analyzer, replace the parameters with variables and analyse those
statements only. If needed, you may want to add a "SELECT * FROM
@.result WHERE ..." at the end, to comparate the cost of those
statements with the cost that was shown in for the calling query.
On the other hand, the query cost is not directly linked with the time
(as Mike already said), although it should be somehow an rough
estimation of time. It is interesting to read the following KB Article:
http://support.microsoft.com/kb/q285996
Razvan|||Thanks for the info, Razvan. The link was interesting reading, although did
not relate to my post.
Given that the udf is table-valued type, running the same query with same
output, only differing on index usage still yields a high cost ratio,
whereas both queries actually took the same time to run. In fact the same
can be observed for a table as well. Maybe I am doing tests with relatively
small datasets (2-5000 records) so there is not much difference. I think the
impact would be huge when doing the same exercise with much much larger
datasets (the execution plan would still remain the same?).
Some of my understanding now about the query cost is that it is not related
to time taken to actually run the query at all (although in finality it has
an impact on time to run). It rathers determines the best way to run the
query with regards to indexes and all. It also seems to me the optimiser is
built around some assumptions (like s is good, scan is bad, generally)
and there are standard cost values associated, such that a clustered index
s has a better cost associated than a clustered index scan.
Thanks again,
Raj|||the cost is not necessarily correlated to Time to execute the query.
Cost is a relative measurement of general "Cost" to execute the query.
this includes the usage of system resources (CPU, RAM, IO, etc).
now, Generally speaking the higher the cost, the longer the execution time.
But you'll not see an exact match to the milliseconds required to execute
the query.
The cost is a better measurement than time as you move from system to
system.
Depending on hardware differences, etc, running a query on Machine A at 2 ms
may lead you to believe that it performs well. Then you move the same query
to a different machine (With more data, etc) and the thing Tanks.
If you focus on SubTree Cost in query plan, you can get more consistent data
to base your decisions on regarding the queries overall performance.
in a nutshell.
If the Subtree cost is high, regardless of how FAST the query is returning,
the thing is likely not optimized.
hope this makes some sense. If you want more info, post back.
GAJ|||GAJ,
Thanks for the very instructive lessons. Yes, I totally agree with what you
explained. I do not deny the merits of cost measurements in comparison to
time measurements (I admit I am still learning more and more on the matter
as I am digging deeper, thanks to the replies on this forum, not only to my
posts but others' as well).
My focus is not on time but rather the relative (I underline, relative)
difference in cost measurements. The ratio seems too big. I only took the
time measurements as a comparison of the two queries. Remember I am not
talking about running the same query on two different machines; it is rather
two slightly different queries returning the same dataset run on the same
machine. I think no matter which machine those two queries were put on, the
relative difference (the ratio) of the cost measurements would not change
much (under correction from the experts!).
And the ratio is much bigger when it comes to the trigger query.
Unexplainable? My next "struggle" then will be to understand how the
optimiser works and renders it's execution plans. That will surely shed more
light.
I hope not to be seen as being rude asking to read my original post again.
Thanks.
Raj|||Raj,
The optimizer must make many guesses about query cost.
One place you can look to understand why the estimates don't
match reality is at the estimated rowcount number in the popup
information from the graphical plan operator. You can compare
these with the actual counts, which will show (along with the
estimated count and estimated, but not actual, costs) if you choose
Show Execution Plan in Query Analyzer, then run the query.
For the trigger query, the optimizer probably can only guess how
many rows will be in the inserted and deleted tables - if the estimated
rowcounts here or elsewhere are far from the actual values, that would
explain some of the inconsistencies.
In addition to inaccurate rowcount estimates (usually because there is
no way to be accurate without running the query), percentage estimates
will differ from reality in other situations, such as:
User-defined function calls other than in-line table-valued
functions (scalar UDF calls are not considered in the cost).
Procedures with program flow logic that can't be evaluated
until runtime (the optimizer will cost out all branches,even
though some will ultimately have zero cost).
Steve Kass
Drew University
Khooseeraj Moloye wrote:

>GAJ,
>Thanks for the very instructive lessons. Yes, I totally agree with what you
>explained. I do not deny the merits of cost measurements in comparison to
>time measurements (I admit I am still learning more and more on the matter
>as I am digging deeper, thanks to the replies on this forum, not only to my
>posts but others' as well).
>My focus is not on time but rather the relative (I underline, relative)
>difference in cost measurements. The ratio seems too big. I only took the
>time measurements as a comparison of the two queries. Remember I am not
>talking about running the same query on two different machines; it is rathe
r
>two slightly different queries returning the same dataset run on the same
>machine. I think no matter which machine those two queries were put on, the
>relative difference (the ratio) of the cost measurements would not change
>much (under correction from the experts!).
>And the ratio is much bigger when it comes to the trigger query.
>Unexplainable? My next "struggle" then will be to understand how the
>optimiser works and renders it's execution plans. That will surely shed mor
e
>light.
>I hope not to be seen as being rude asking to read my original post again.
>Thanks.
>Raj
>
>|||you're not being rude at all.
GAJ

No comments:

Post a Comment