<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>sqlity.net</title>
	
	<link>http://sqlity.net/en</link>
	<description>Quality for SQL</description>
	<lastBuildDate>Fri, 18 May 2012 13:05:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sqlitynet" /><feedburner:info uri="sqlitynet" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Capturing Parameters of a Stored Procedure Call</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/JwkEkNMlW3Y/</link>
		<comments>http://sqlity.net/en/976/capturing-parameters-of-a-stored-procedure-call/#comments</comments>
		<pubDate>Fri, 18 May 2012 13:05:28 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Tracing]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=976</guid>
		<description><![CDATA[Recently a user on the PSSUG mailing list asked how he could capture the parameters passed to a stored procedure on every execution. In this article I would like to show a few different options to accomplish that.]]></description>
			<content:encoded><![CDATA[
<div>
<h3>Introduction</h3>
<p>
Recently a user on the PSSUG mailing list asked how he could capture the parameters passed to a stored procedure on every execution. In this article I would like to show a few different options to accomplish that.
</p>
<h3>Tracing</h3>
<p>
The first option you have is to run a trace. There are two events of interest here: <span class="tt">sp:starting</span> and <span class="tt">RPC:starting</span>. When setting up the trace, both events can be filtered on the <span class="tt">ObjectName</span> column to only capture calls of the procedure in question. <span class="tt">sp:starting</span> can also be filtered on the <span class="tt">ObjectId</span> column for the same purpose.
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/procedure_execution_trace_events.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/procedure_execution_trace_events.png" alt="procedure execution trace events Capturing Parameters of a Stored Procedure Call" title="Procedure Execution Trace Events" width="847" height="536" class="aligncenter size-full wp-image-978" /></a>
</p>
<p>
<span class="tt">sp:starting</span> captures the lines in the input buffer that make up the current call to the procedure. If the procedure is called directly from an application, you will be able to see that parameters in here. The <span class="tt">TextData</span> column will contain something like this:
</p>
<p>
<code>
EXEC dbo.Traceable @P1 = 17, @P2 = 'StrVal1', @P3 = '2012-03-04 05:06:07.08';
</code>
</p>
<p> 
If however the procedure gets called in a batch or from within another procedure and the parameters passed are stored in variables, you are out of luck as you will only get to see the names of those variables:
</p>
<p>
<code>
EXEC dbo.Traceable @P1 = @V1, @P2 = @V2, @P3 = @V3;
</code>
</p>
<p> 
The <span class="tt">RPC:starting</span> event fires only, if the procedure is called directly from an application. It does not fire at all when the stored procedure call is part of a batch or from within another procedure, so it does not help with the above problem either. However, it does have the advantage that it also fires, if the call from the application did not result in an execution, for example because a parameter name was messed up. In that case you will get all parameters of the erroneous call listed, again in the <span class="tt">TextData</span> field:
</p>
<p>
<code>
exec dbo.Traceable @P1=19,@P2=N'StrVal4',@WrongParameter='2012-05-16 19:15:59.297'
</code>
</p>
<p> 
Because the procedure won't start executing in this case, the <span class="tt">sp:starting</span> event does not fire at all.
</p>
<p>
So both events do not cover all cases. If you are trying to see all calls to a procedure from an application, go with <span class="tt">RPC:starting</span>. If on the other hand the procedure gets executed as part of a bigger batch or from within another procedure, use <span class="tt">sp:starting</span> but be aware that variable usage might hide the actual parameter values from you.
</p>
<h3>Active Logging</h3>
<p>
If the two mentioned trace events do not get you where you need to be, you have another option:
Rename the original procedure and create a new one that first logs the parameters to a table and then calls the (renamed) original procedure.
</p>
<p>
The following script creates a procedure <span class="tt">dbo.Traceable</span>. This is the procedure we want to trace. Its original behavior is to return a single result set.
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.Traceable') IS NOT NULL DROP PROCEDURE dbo.Traceable;
GO

CREATE PROCEDURE dbo.Traceable
@P1 INT,
@P2 NVARCHAR(33),
@P3 DATETIME
AS
BEGIN
  SELECT @P1 [dbo.Traceable], @P2 got, @P3 [called!];
END
</pre>
</div>
<p>
To setup logging for this procedure you can run the following script:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
BEGIN TRAN

EXEC sp_rename @objname = 'dbo.Traceable', @newname = 'Traceable_Spyed';

CREATE TABLE dbo.Traceable_Log
  (
    Id INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED ,
    P1 INT ,
    P2 NVARCHAR(33) ,
    P3 DATETIME ,
    LogDTime DATETIME2 DEFAULT SYSDATETIME()
  );

GO
CREATE PROCEDURE dbo.Traceable
@P1 INT,
@P2 NVARCHAR(33),
@P3 DATETIME
AS
BEGIN
  INSERT  INTO dbo.Traceable_Log
          ( P1, P2, P3 )
  VALUES  ( @P1, @P2, @P3 );
  
  EXEC dbo.Traceable_Spyed @P1 = @P1, @P2 = @P2, @P3 = @P3;
END
GO
COMMIT; 
</pre>
</div>
<p>
When modifying this script for your own purposes, make sure that the <span class="tt">@newname</span> parameter of <span class="tt">sp_rename</span> does not contain the schema name. Otherwise the renamed object will end up with a name like <span class="tt">[dbo].[dbo.Traceable_Spyed]</span> which will be very inconvenient. You also might want to check that <span class="tt">dbo.Traceable_Spyed</span> does not exist already to protect against double execution.
</p>
<p> 
With this in place, every successful call to the procedure will get logged. The log table also establishes an execution order and has timestamps:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/procedure_execution_log.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/procedure_execution_log.png" alt="procedure execution log Capturing Parameters of a Stored Procedure Call" title="Procedure Execution Log" width="565" height="121" class="aligncenter size-full wp-image-979" /></a>
</p>
<p>
To uninstall this logging wrapper, just execute the following code:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
BEGIN TRAN;
   DROP PROCEDURE dbo.Traceable;
   EXEC sp_rename @objname = 'dbo.Traceable_Spyed', @newname = 'Traceable';
COMMIT; 
</pre>
</div>
<p>
Again you might want to add code to protect from double execution by checking that <span class="tt">dbo.Traceable_Spyed</span> does exist before dropping <span class="tt">dbo.Traceable</span>.
</p>
<h3>Alternative Solution</h3>
<p>
Both above mentioned options have their drawbacks. The first might not capture all parameters in all cases and might therefore just not work at all for you. The second one requires that you have the ability to rename objects and replace them with your own. There might be many reasons preventing you from doing that.
</p>
<p>
If you are stuck because of that, there is one more option: The <span class="tt">Showplan XML Statistics Profile</span> trace event. You can find more information about this event - including how to filter it - here: <a href="http://sqlity.net/en/952/tracing-the-actual-execution-plan-for-a-single-query/ ">Tracing the Actual Execution Plan for a single Query</a>
</p> 
<p>
This event captures the actual execution plan of the procedure (or batch) after each single execution. The execution plans are captured in XML format and stored in the <span class="tt">TextData</span> column. To access it from there after you inserted the trace data into a table, you have to cast the <span class="tt">TextData</span> column to XML:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT CAST(TextData AS XML) ExecutionPlan FROM dbo.TraceTable;
</pre>
</div>
<p>
Inside the execution plan look for the <span class="tt">ParameterList</span> node. It will look something like this:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;ParameterList&gt;
  &lt;ColumnReference Column=&quot;@P3&quot; ParameterCompiledValue=&quot;'2012-03-04 05:06:07.080'&quot; ParameterRuntimeValue=&quot;'2012-03-04 05:06:07.080'&quot; /&gt;
  &lt;ColumnReference Column=&quot;@P2&quot; ParameterCompiledValue=&quot;N'StrVal1'&quot; ParameterRuntimeValue=&quot;N'StrVal1'&quot; /&gt;
  &lt;ColumnReference Column=&quot;@P1&quot; ParameterCompiledValue=&quot;(17)&quot; ParameterRuntimeValue=&quot;(17)&quot; /&gt;
&lt;/ParameterList&gt;
 </pre>
</div>
<p>
It does not only provide you with the parameter values of the current execution (<span class="tt">ParameterRuntimeValue</span>), it also lists the parameter values that were used when compiling the execution plan (<span class="tt">ParameterCompiledValue</span>). This can be very useful when you are trying to hunt down parameter sniffing issues.
</p>
<p>
However, you need to be aware that this is a fairly large event. So make sure you use the server side trace functionality instead of the Profiler UI and also make sure to have enough disk space available on the drive that you are writing the trace file to. You also should set a maximum file size in the trace definition.
</p>
<p>
Happy Tracing!
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/JwkEkNMlW3Y" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/976/capturing-parameters-of-a-stored-procedure-call/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/976/capturing-parameters-of-a-stored-procedure-call/</feedburner:origLink></item>
		<item>
		<title>The Mysterious “sp_” System Procedure Prefix</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/O2zTLbCSWkA/</link>
		<comments>http://sqlity.net/en/961/the-mysterious-sp_-system-procedure-prefix/#comments</comments>
		<pubDate>Sun, 13 May 2012 21:04:46 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=961</guid>
		<description><![CDATA[<p>
If you have been working with SQL Server for a while you probably know that you should not select names for your stored procedures that start with the three characters "sp_".
</p>
<p>
Today I would like to take a closer look at all the myths and facts around this prefix. All examples in this article have been tested on SQL 2008R2 and on SQL 2012.
</p>
]]></description>
			<content:encoded><![CDATA[<div>
<h3>Introduction</h3>
<p>
If you have been working with SQL Server for a while you probably know that you should not select names for your stored procedures that start with the three characters "sp_".
</p>
<p>
Today I would like to take a closer look at all the myths and facts around this prefix. This will include a series of examples, all of which have been tested on SQL 2008R2 and on SQL 2012.
</p>
<h3>Special Objects</h3>
<p>
The first myth I would like to bust is that the "sp_" prefix stands for "system procedure".
</p>
<p>
While the BOL entry for <a href="http://msdn.microsoft.com/en-us/library/ms190669(v=sql.105).aspx">Creating Stored Procedures</a> uses language that could easily be interpreted like that, it actually never says anything about the etymology of this prefix. In fact, the "sp_" prefix causes a special path to be taken in the resolution of the object name. This works for stored procedures as well as for other object types. Later on this article contains an example with a table.
</p>
<p>
As "sp_" designates more objects than just stored procedures as special, it clearly does not mean "System Procedure", even though it currently is only used for system procedures. Instead it just stands for "special".
</p>
<h3>Resource Database</h3>
<p>
The idea for this article came from a tweet in a discussion about this prefix started by @ericstephani. In the tweet in question @SirSQL suggested to check if <span class="tt">master</span> was indeed the database checked first, as everyone in the discussion had assumed before, or if it rather is the resource database. This makes a lot of sense, as most of the system objects live in the resource database. So I decided to check this out by running a series of tests.
</p>
<p>
If you want to follow along with these tests you will need two databases: One with the name <span class="tt">test</span> and one that is called <span class="tt">OtherDb</span>.
</p>
<p>
For the first test I created a procedure with the name <span class="tt">sp_executesql</span> in <span class="tt">master</span> as well as in <span class="tt">test</span> and then executed it from both places:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
USE master;
GO
CREATE PROCEDURE dbo.sp_executesql
@statement NVARCHAR(MAX)
AS
BEGIN
  SELECT DB_NAME() [called from], 'master' [sp_executesql];
END
GO
EXEC dbo.sp_executesql @statement = N'SELECT DB_NAME() [called from], ''system'' [sp_executesql];';
GO
DROP PROCEDURE dbo.sp_executesql;
GO
------------------------------------------------
GO
USE test;
GO
CREATE PROCEDURE dbo.sp_executesql
@statement NVARCHAR(MAX)
AS
BEGIN
  SELECT DB_NAME() [called from], 'test' [sp_executesql];
END
GO
EXEC dbo.sp_executesql @statement = N'SELECT DB_NAME() [called from], ''system'' [sp_executesql];';
GO
DROP PROCEDURE dbo.sp_executesql;
GO
</pre>
</div>
<p>
The <span class="tt">sp_executesql</span> version that gets shipped with SQL Server lives in the resource database and so we would expect this version to get executed in all cases. This is indeed correct, as you can see in the result:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/sp_executesql_test_results.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/sp_executesql_test_results.png" alt="sp executesql test results The Mysterious sp  System Procedure Prefix" title="sp_executesql test results" width="194" height="109" class="aligncenter size-full wp-image-965" /></a>
</p>
<p>
The above test shows that the resource database is checked first when an object name with the "sp_" prefix is encountered. That means that if you create any object with that name prefix you are running the risk that it will not be accessible anymore after the next service pack install if Microsoft decided to create an object with the same name in the resource database. This is even true when the resource database object is of a different type than your object. This you can quickly confirm by creating a table with the name <span class="tt">sp_executesql</span> in any database and trying to insert a row into it.
</p>
<h3>Master Piece</h3>
<p>
The fact that the resource database is the one checked first to resolve "sp_" object names raises the question where the <span class="tt">master</span> database fits in. Read on, as the results might surprise you.
</p>
<p>
The second test involves a procedure that is not (yet) part of the resource database: <span class="tt">sp_MyOwnProc</span>
</p>
<p>
I again created this procedure in <span class="tt">master</span> and in <span class="tt">test</span>:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
GO
USE master;
GO
CREATE PROCEDURE sp_MyOwnProc
AS
BEGIN
  SELECT DB_NAME() [called from], 'master' [sp_MyOwnProc];
END
GO
USE test;
GO
CREATE PROCEDURE sp_MyOwnProc
AS
BEGIN
  SELECT DB_NAME() [called from], 'test' [sp_MyOwnProc];
END
GO
EXEC sp_MyOwnProc;
GO
USE OtherDb;
GO
EXEC sp_MyOwnProc;
GO
USE test;
GO
DROP PROCEDURE sp_MyOwnProc;
GO
USE master;
GO
DROP PROCEDURE sp_MyOwnProc;
GO
</pre>
</div>
<p>
The above code creates the procedure in both places. It then executes the <span class="tt">EXEC sp_MyOwnProc;</span> statement, first from the <span class="tt">test</span> database and afterwards from the <span class="tt">OtherDb</span> database. The result is shown below:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/sp_MyOwnProc_test_results.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/sp_MyOwnProc_test_results.png" alt="sp MyOwnProc test results The Mysterious sp  System Procedure Prefix" title="sp_MyOwnProc test results" width="200" height="108" class="aligncenter size-full wp-image-966" /></a>
</p>
<p>
As you can see, <span class="tt">master</span> is not checked first for an "sp_" object. Instead the object in the current database is used. Only if the current database does not contain the object in question <span class="tt">master</span> is looked at to see if the object exists in there.
</p>
<p>
The same behavior can be observed when all references to the <span class="tt">sp_MyOwnProc</span> in the above script are schema-qualified with <span class="tt">dbo.</span>
</p>
<h3>Other Object Types</h3>
<p>
You also get the same behavior if you go through this exercise with a table:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
USE master;
GO
CREATE TABLE sp_MyOwnTable(
sp_MyOwnTable NVARCHAR(MAX)
);
INSERT INTO sp_MyOwnTable SELECT DB_NAME();
GO
USE test;
GO
CREATE TABLE sp_MyOwnTable(
sp_MyOwnTable NVARCHAR(MAX)
);
INSERT INTO sp_MyOwnTable SELECT DB_NAME();
GO
SELECT DB_NAME() [called from], * FROM sp_MyOwnTable;
GO
USE OtherDb;
GO
SELECT DB_NAME() [called from], * FROM sp_MyOwnTable;
GO
USE test;
GO
DROP TABLE sp_MyOwnTable;
GO
USE master;
GO
DROP TABLE sp_MyOwnTable;
GO
</pre>
</div>
<p>
This script creates a table with the name <span class="tt">sp_MyOwnTable</span> in the <span class="tt">master</span> as well as in the <span class="tt">test</span> database and then executes a select against this name executing in <span class="tt">test</span> as well as in <span class="tt">OtherDb</span>. The result is shown here:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/sp_MyOwnTable_test_reults.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/sp_MyOwnTable_test_reults.png" alt="sp MyOwnTable test reults The Mysterious sp  System Procedure Prefix" title="sp_MyOwnTable test reults" width="206" height="108" class="aligncenter size-full wp-image-967" /></a>
</p>
<p>
This shows that the name resolution works the same for tables as it does for stored procedures. It also works views, but not for functions or user defined types.
</p>
<h3>Performance Impact</h3>
<p>
Now we know that SQL Server tries to find an "sp_" object in the resource database first and only if it does not exist there the search is continued in the current database. So there should be a measurable performance impact showing this extra work.
</p>
<p>
To measure the impact I used this script:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
USE test;
GO
IF OBJECT_ID('dbo.sp_MyOwnProc2') IS NOT NULL DROP PROCEDURE dbo.sp_MyOwnProc2;
GO
CREATE PROCEDURE dbo.sp_MyOwnProc2
AS
RETURN 0;
GO
IF OBJECT_ID('dbo.MyOwnProc2') IS NOT NULL DROP PROCEDURE dbo.MyOwnProc2;
GO
CREATE PROCEDURE dbo.MyOwnProc2
AS
RETURN 0;
GO
------------------------------------
GO
DECLARE @StartTime DATETIME2 = SYSDATETIME();
DECLARE @EndTime DATETIME2 = SYSDATETIME();
DECLARE @Counter INT = 0;
DECLARE @CmdA NVARCHAR(100) = 'EXEC dbo.MyOwnProc2;--';
DECLARE @CmdB NVARCHAR(100) = 'EXEC dbo.sp_MyOwnProc2;--';
DECLARE @TimeA BIGINT=0;
DECLARE @TimeB BIGINT=0;
DECLARE @Cmd2 NVARCHAR(100);
WHILE(@Counter&lt;10000000)
BEGIN
  SET @Cmd2 = @CmdA + CAST(@Counter AS NVARCHAR(20));
  SET @StartTime = SYSDATETIME();
  EXEC(@Cmd2);
  SET @EndTime = SYSDATETIME();
  SET @TimeA += DATEDIFF(microsecond,@StartTime,@EndTime)
  SET @Cmd2 = @CmdB + CAST(@Counter AS NVARCHAR(20));
  SET @StartTime = SYSDATETIME();
  EXEC(@Cmd2);
  SET @EndTime = SYSDATETIME();
  SET @TimeB += DATEDIFF(microsecond,@StartTime,@EndTime)
  SET @Counter += 1;
END
SELECT @Counter Counter,@TimeA [MyOwnProc2], @TimeB [sp_MyOwnProc2];
GO
</pre>
</div>
<p>
It first creates two identical stored procedures that do nothing but return a 0. The first one is called <span class="tt">sp_MyOwnProc2</span>, the second one carries the name <span class="tt">MyOwnProc2</span> without the "sp_" prefix. To measure the performance impact, the script calls both procedures alternating in a loop and records their execution times. Each call gets executed as dynamic sql with the current loop count being part of the sql string. This prevents any possible attempts to cache the plan for the batch. It does however not prevent caching the plan for the procedure itself. That is okay, as we are after the name resolution piece of the execution. Because both procedures get called alternatingly, any background noise caused by other processes on the test system should affect both evenly.  
</p>
<p>
On my system I got these results:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/05/sp_prefix_performance_test_results.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/05/sp_prefix_performance_test_results.png" alt="sp prefix performance test results The Mysterious sp  System Procedure Prefix" title="sp_ prefix performance test results" width="283" height="63" class="aligncenter size-full wp-image-968" /></a>
</p>
<p>
After 10 million executions of each of the two procedures you can see, that there is a performance impact. With less than 2 percent, however, it is very small compared to the time it takes to just call the procedure. Remember, that the procedures in this test did not actually do any work, so all the time recorded by this test was spent on identifying the procedure and the overhead of calling it.
</p>
<h3>Conclusion</h3>
<p>
The "sp_" object name prefix causes SQL Server to take a special route when resolving the name of this object: First SQL Server checks if the object exists in the hidden resource database. Second it tries to find the object in the current database and if it does not exist there SQL Server goes on to check if the object exists in the <span class="tt">master</span> database.
</p>
<p>
If you name your own objects using this prefix, you have to be aware of two possible consequences:
First there is a small but measurable impact on performance. Second it can cause your application to suddenly break, if Microsoft decides to add an object with the same name to the resource database.
</p>
<p>
Because of that it is a best practice to not use the "sp_" name prefix anywhere in your code. 
</p>
<p>
There is one exception however: If you are creating an object that you want to be accessible from all databases, you can use this prefix and place the object in the <span class="tt">master</span> database. However, because of the database precedence there are now two possibilities for those objects to get eclipsed by another object. So, if you go this route, make sure to regularly check that the object you are trying to execute is actually the one executing.
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/O2zTLbCSWkA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/961/the-mysterious-sp_-system-procedure-prefix/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/961/the-mysterious-sp_-system-procedure-prefix/</feedburner:origLink></item>
		<item>
		<title>Tracing the Actual Execution Plan for a single Query</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/fkuTh1b_22k/</link>
		<comments>http://sqlity.net/en/952/tracing-the-actual-execution-plan-for-a-single-query/#comments</comments>
		<pubDate>Sat, 05 May 2012 21:18:30 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=952</guid>
		<description><![CDATA[SQL Server offers several ways to get to the execution plan for a particular query. Most of them however only provide the estimated execution plan without actual counts and statistics. In this article we are going to look at a way to get to the actual execution plan of a particular query using a trace.]]></description>
			<content:encoded><![CDATA[<div>
<h3>Introduction</h3>
<p>
SQL Server offers several ways to get to the execution plan for a particular query. Most of them however only provide the estimated execution plan without actual counts and statistics. In this article we are going to look at a way to get to the actual execution plan of a particular query.
</p>
<h3>Tracing</h3>
<p>
The certainly easiest way to get the actual plan of a query is to actually execute it in SSMS while the "collect actual execution plan" option is turned on. However, sometimes you want to collect the actual execution plan for a query every time it gets executed in your production environment. The only way to do this in SQL Server 2008R2 and earlier was to run a trace and collect the "Showplan XML Statistics Profile" event.
</p>
<p>
The problem with this approach is that it is not trivial to restrict the collected information to execution plans of one query. That means that on a busy system thousands of these events will fire in a very short time. With the included XML execution plan the amount of data to be collected will be quickly overloading most production systems.
</p>
<h3>Filtering Options</h3>
<p>
If the statement you are interested in is inside of a stored procedure you are somewhat in luck, as it is simple to filter by the name of the procedure which is returned by the event in the ObjectName column.
</p>
<p>
However, if the statement is an ad hoc or a prepared statement, this simple option does not exist anymore. There are a few other columns you can filter by like the login name, but most of the time you won't have the option to single out a specific query to use different connection settings.
</p>
<p>
The next option would be to filter on the text of the query. However, the query text is unfortunately not included in this event &ndash; neither as separate column nor within the execution plan XML.
That leaves only one column that we can try to filter on, the execution plan itself.
</p>
<p>
While the execution plan is stripped of most of the query text, there are two types of names that are included in the execution plan: Parameter names and table alias names. Parameter names are only included if the statement is a prepared statement. Table alias names will be included in all types of statements that access a table. Be aware, that column alias names are not included in the execution plan. 
</p>
<h3>Random Alias</h3>
While most often the table alias names are not unique between all the queries in a system, it is usually not too difficult to change a particular query to use some distinct character string as an alias name for one table. A change like that will not modify the query behavior nor will it influence the plan choice of the optimizer but it will us something to filter by.
</p>
<p>
The easiest way to come up with such an identifying name is to just use a random character string like "ir83n476s9d". Make sure however that the name of your choice begins with a letter. It also should not contain any special characters.
</p>
<p>
Once your query is prepared like this, you can easily restrict a "Showplan XML Statistics Profile" trace to only include this query by using a "Like" filter on the TextData column and setting it to "%ir83n476s9d%".
</p>
<h3>Conclusion</h3>
<p>
This simple trick allows you to filter the "Showplan XML Statistics Profile" event in a trace to fire only for a particular query. While it is not always possible to change a query to accommodate this, in many cases it is easy to do.
</p>
<p>
On final word of caution: When running traces in a production environment you should never use the SQL Profiler, especially when dealing with high-volume events like the "Showplan XML Statistics Profile" event. Instead setup your trace as a server-side trace to write the collected data to a fast, preferably dedicated drive. That way the performance impact on the system will be kept as small as possible.
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/fkuTh1b_22k" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/952/tracing-the-actual-execution-plan-for-a-single-query/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/952/tracing-the-actual-execution-plan-for-a-single-query/</feedburner:origLink></item>
		<item>
		<title>The Unloved Backward Scan</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/cJIF-YblauA/</link>
		<comments>http://sqlity.net/en/930/the-unloved-backward-scan/#comments</comments>
		<pubDate>Sun, 29 Apr 2012 20:31:29 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=930</guid>
		<description><![CDATA[<p>
When a query requires rows to be sorted, either directly requested with an [tt]ORDER BY[/tt] clause or because one of the iterators requires it, SQL Server has two options to guarantee that order. The obvious one is to utilize a sort iterator. If the data comes from an index (clustered or covering), SQL Server can also use an "Ordered Scan" of the data. Depending on the requested sort direction, such an ordered scan can be either forward or backward.
</p>
<p>
This would hardly be worth an article, if there wasn't the peculiarity that SQL Server obviously does not like the idea of having to execute an ordered scan that is directed backwards.
</p>]]></description>
			<content:encoded><![CDATA[<div>
<h3>Introduction</h3>
<p>
When a query requires rows to be sorted, either directly requested with an <span class="tt">ORDER BY</span> clause or because one of the iterators requires it, SQL Server has two options to guarantee that order. The obvious one is to utilize a sort iterator. If the data comes from an index (clustered or covering), SQL Server can also use an "Ordered Scan" of the data. Depending on the requested sort direction, such an ordered scan can be either forward or backward.
</p>
<p>
This would hardly be worth an article, if there wasn't the peculiarity that SQL Server obviously does not like the idea of having to execute an ordered scan that is directed backwards.
</p>
<h3>Backward Scan Dislike</h3>
<p>
Let's look at an example. First we need some tables with data:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.T1') IS NOT NULL DROP TABLE dbo.T1;
IF OBJECT_ID('dbo.T2') IS NOT NULL DROP TABLE dbo.T2;

CREATE TABLE dbo.T1(
Id INT CONSTRAINT T1_PK PRIMARY KEY CLUSTERED,
v1 INT,
c1 VARCHAR(8000)
);

CREATE TABLE dbo.T2(
Id INT CONSTRAINT T2_PK PRIMARY KEY CLUSTERED,
v1 INT,
c1 VARCHAR(8000)
);

INSERT INTO dbo.T1(Id,v1,c1)
SELECT n,CHECKSUM(NEWID()),'*'
FROM dbo.GetNums(1000000);

INSERT INTO dbo.T2(Id,v1,c1)
SELECT n,CHECKSUM(NEWID()),'*'
FROM dbo.GetNums(1000000);
--Get dbo.GetNums here: http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers 
</pre>
</div>
<p>
This script is creating two tables containing one million rows each. The v1 column is filled with random values; the c1 column just contains a single constant character. The Id column is the clustered primary key and it is providing the ordering on disk that we are going to use.
</p>
<p>
The query is a little made up. It joins T2 to itself bringing back only one set of columns. It also restricts the result of this join to the first one million rows, sorted by T2.Id. The one million values in T2.v1 are random from a set of 4 billion making them "mostly" unique. That means the self-join is expected to return about one million rows anyway. Those rows are then joined to T1. In the end the <span class="tt">ORDER BY</span> requests that all rows be returned sorted by the Id column of the T1 table:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  *
FROM    dbo.T1 A
        INNER JOIN ( SELECT TOP ( 1000000 )
                            B.*
                     FROM   dbo.T2 B
                            INNER JOIN dbo.T2 C ON B.v1 = C.v1
                     ORDER BY B.Id
                   ) BC ON A.Id = BC.Id
ORDER BY A.Id;
</pre>
</div>
<p>
The execution plan of this query looks like this:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Plan_with_Forward_Scan.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Plan_with_Forward_Scan.png" alt="Plan with Forward Scan The Unloved Backward Scan" title="Plan with Forward Scan" width="1123" height="248" class="aligncenter size-full wp-image-937" /></a>
</p>
<p>
There is nothing unexpected in this plan. The T2 self-join is done with a hash join operator, as there is no index on the v1 column. The data is then sorted by a sort operator and passed through a serial zone for the top operator. After that there is no additional sort operator for the data to pass through. As the merge join requires both streams to be sorted the same way on the join column, this means that the data must be produced sorted by the scan of the T1 table. A quick look at its properties confirms that:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Forward_Scan_Iterator_Properties.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Forward_Scan_Iterator_Properties.png" alt="Forward Scan Iterator Properties The Unloved Backward Scan" title="Forward Scan Iterator Properties" width="263" height="389" class="aligncenter size-full wp-image-934" /></a>
</p>
<p>
This scan is a forward scan, as both <span class="tt">ORDER BY</span> statements  are asking to sort by Id ascending. To make this scan go backwards we should just have to change the query to sort by Id descending in both places:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  *
FROM    dbo.T1 A
        INNER JOIN ( SELECT TOP ( 1000000 )
                            B.*
                     FROM   dbo.T2 B
                            INNER JOIN dbo.T2 C ON B.v1 = C.v1
                     ORDER BY B.Id DESC
                   ) BC ON A.Id = BC.Id
ORDER BY A.Id DESC;
</pre>
</div>
<p>
 However, this query produces this rather unexpected plan:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Backwards_Scan_avoidance_with_Loop_Join.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Backwards_Scan_avoidance_with_Loop_Join.png" alt="Backwards Scan avoidance with Loop Join The Unloved Backward Scan" title="Backward Scan avoidance with Loop Join" width="1110" height="164" class="aligncenter size-full wp-image-933" /></a>
</p>
<p>
A loop join with an index seek against the T1 table to retrieve 1,000,000 single rows &ndash; that cannot be good.
</p>
<p>
Before we look at execution statistics let's force the merge join back by specifying a join hint:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  *
FROM    dbo.T1 A
        INNER MERGE JOIN ( SELECT TOP ( 1000000 )
                                  B.*
                           FROM   dbo.T2 B
                                  INNER JOIN dbo.T2 C ON B.v1 = C.v1
                           ORDER BY B.Id DESC
                         ) BC ON A.Id = BC.Id
ORDER BY A.Id DESC;
</pre>
</div>
<p>
 Now we get the expected plan again:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Merge_Join_encouraged_Backward_Scan.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Merge_Join_encouraged_Backward_Scan.png" alt="Merge Join encouraged Backward Scan The Unloved Backward Scan" title="Merge Join encouraged Backward Scan" width="1117" height="250" class="aligncenter size-full wp-image-936" /></a>
</p>
<p>
If you compare the properties for the iterators in the forward scan and the backward scan plan you will notice, that the cost of the two scan directions is not that much different:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Backward_Scan_Iterator_properties.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Backward_Scan_Iterator_properties.png" alt="Backward Scan Iterator properties The Unloved Backward Scan" title="Backward Scan Iterator Properties" width="1082" height="572" class="aligncenter size-full wp-image-932" /></a>
</p>
<p>
The estimated cost of the scan iterator is 0% in both cases. The bulk of the work is done in the hash self-join and the following sort of the T2 table.
</p>
<p>
So, let us take a look at the execution statistics of all three queries:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Query_Execution_Statistics.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Query_Execution_Statistics.png" alt="Query Execution Statistics The Unloved Backward Scan" title="Query Execution Statistics" width="656" height="77" class="aligncenter size-full wp-image-938" /></a>
</p>
<p>
The execution statistics clearly show that the decision to go with a loop join was not necessarily the best. While the estimated cost for the backward scan is about 150% of the estimated cost for the loop join, the actual reads for the loop join version are about 3 million, which is more than 250 times higher than the 8000 reads for either scan direction.
</p>
<h3>Disk Access</h3>
<p>
To understand were this dislike is coming from we need to look at the access pattern necessary to retrieve the data from disk. A forward scan of a table (or index), that is not heavily fragmented, looks pretty much like doing a single contiguous read of a big blob of data. SQL Server has a lot of performance optimizations build in that make this type of access as fast as possible. The Read-Ahead mechanism is a good example for that.
</p>
<p>
A backward scan on the other hand looks quite the opposite: After reading a page, the disk has to do almost a complete turn to get to the previous page, which is the next one in line to be read. This is about the worst kind of random access you can come by. 
</p>
<p>
As SQL Server always assumes that none of the requested pages are in cache, the decision to not get into that backward spinning game seems quite understandable. 
</p>
<p>
Also, if you closely compare the two "scanning" execution plans you will notice that while the forward scan iterator is parallelized, the backward scan operator is executed single threaded. This is a general rule: SQL Server cannot execute a backward scan in parallel. That makes sense as with all that waiting for the disk to do another spin to get to the "next" page, it is unlikely for the data to come in quickly enough to keep multiple threads busy.
</p>
<h3>Estimates?</h3>
<p>
But wait, there is one more thing: If you look at the image above and compare the properties of all iterators in the two scanning plans, you will notice something odd. The estimated CPU cost of the "hash match inner join" iterator goes from 11 in the "forward" case to 1357 in the "backward" case; the estimated IO cost makes a similar jump from 0 to 1654. That seems to not make a lot of sense, as the next iterator, the sort, is a blocking iterator. "Blocking" means that the iterator reads in <u>all</u> input rows into a holding area before producing any output rows. This implies that anything on the left side of a sort should not be able to influence the cost of operators on the right side of it. That is, unless it affects the number of rows significantly, as this is a TOP N Sort. See <a href="http://sqlity.net/en/908/top-n-sort-a-little-bit-of-sorting/"> TOP N Sort – A Little Bit of Sorting </a> for an explanation. But as in this example all columns are "mostly" unique, the optimizer guesses correctly that about one million rows get passed into the hash join on each input and also about one million rows come out the other end. So row count estimates should not matter here. 
</p>
<p>
If you look at the estimates when executing just that inner query sorting backwards, you will see that they exactly match the ones in the imbedded forward sorting case:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Iterator_Properties_for_Inner_Select.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Iterator_Properties_for_Inner_Select.png" alt="Iterator Properties for Inner Select The Unloved Backward Scan" title="Iterator Properties for Inner Select" width="994" height="353" class="aligncenter size-full wp-image-935" /></a>
</p>
<p>
That means that the scan direction change of the T1 table causes the estimates on the other side of the merge join to get completely thrown off. 
</p>
<h3>Conclusion</h3>
<p>
SQL Servers dislike of backward scan operations is understandable when looking at the work necessary to retrieve the records from disk in opposite index order. Additionally, a backward scan cannot be executed in parallel. That adds to the list of reasons, why the backward scan seems unloved. 
</p>
<p>
However, the main reason why SQL Server avoids backward scans seems to be that they throw its cost estimations off, making following iterators appear a lot more expensive than they really are.
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/cJIF-YblauA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/930/the-unloved-backward-scan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/930/the-unloved-backward-scan/</feedburner:origLink></item>
		<item>
		<title>TOP N Sort – A Little Bit of Sorting</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/3uyw39SCuvs/</link>
		<comments>http://sqlity.net/en/908/top-n-sort-a-little-bit-of-sorting/#comments</comments>
		<pubDate>Mon, 23 Apr 2012 00:39:06 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=908</guid>
		<description><![CDATA[Sorting is one of the most often used functions in SQL Server. It is used when creating indexes. It allows us to retrieve data in a particular order. It also is used to filter out repeated rows in the context of a DISTINCT request. The SQL Server Team spent a lot of effort optimizing the sort algorithms to give the best possible performance in each context. This blog post looks at the TOP N Sort optimization.]]></description>
			<content:encoded><![CDATA[<div>
<h3>Introduction</h3>
<p>
Sorting is one of the most often used functions in SQL Server. It is used when creating indexes. It allows us to retrieve data in a particular order. It also is used to filter out repeated rows in the context of a <span class="tt">DISTINCT</span> request. The SQL Server Team spent a lot of effort optimizing the sort algorithms to give the best possible performance in each context. They even implemented sort avoidance where the expensive sort gets removed from the query plan if it is known that the input is sorted already or will contain only a single distinct value. 
</p>
<p>
One of those optimizations is the TOP N Sort that I would like to look at in a little more detail today.
</p>
<h3>Memory Requirements for Sorting</h3>
<p>
Sorting requires memory &ndash; potentially a lot of memory. SQL Server estimates the size of the data using statistics and data size estimates and then requests about 150 percent of that in memory for the sort operation.
</p>
<p>To see how much memory a query is using, you can use the <span class="tt">sys.dm_exec_query_memory_grants</span> DMV. It shows the current memory usage for each SPID that requested additional memory for one (or more) of the memory consuming operators.
</p>
<p>
The problem with this DMV is that its values change quickly, so it is difficult to get meaningful information for a particular query. There is however a trick you can use: Every memory consuming iterator has a build phase where most of the memory is used, and a delivery phase in which the rows get send to the parent operator. The memory consumption during the build phase is usually a lot higher. SQL Server recognizes this and reuses the additional memory during the delivery phase for other memory consuming iterators.
</p>
<p>
The <span class="tt">sys.dm_exec_query_memory_grants</span> DMV contains the columns <span class="tt">used_memory_kb</span> and columns <span class="tt">max_used_memory_kb</span> that report current and maximum memory usage. If we can ensure that we get the data from this DMV during the delivery phase of the operator in question, the max_used_memory_kb column will tell us exactly how much memory was used during the build phase. This however only works for queries with a single memory consuming operator that also needs to be blocking.
</p>
<p>
Sort is a blocking operator. That means that during the build phase no rows get transmitted. If we include the request for the memory data in the query and place it before ("left of" in the graphical plan) the sort, we know that the sort will have finished its build phase when the data gets retrieved from the DMV.
</p>
<p>
Let us look at an example:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.FixedWidth') IS NOT NULL DROP TABLE dbo.FixedWidth;

CREATE TABLE dbo.FixedWidth(
Id INT PRIMARY KEY CLUSTERED,
c1 INT NOT NULL,
fill VARCHAR(1000) NOT NULL
);

INSERT INTO dbo.FixedWidth(Id,c1,fill)
SELECT n,CHECKSUM(NEWID()),REPLICATE('X',192)
FROM dbo.GetNums(10000);

SELECT SUM(DATALENGTH(Id)+DATALENGTH(c1)+DATALENGTH(fill))
FROM dbo.FixedWidth;
 </pre>
</div>
<p>
This snippet creates the table dbo.FixedWidth and inserts 10000 rows into it. It then sums up the DATALENGTH of all columns over all rows.  The size of all the data totals 2,000,000 bytes. It is using <a href="http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers">Itzik Ben-Gan's GetNums function</a>.
</p>
<p>
The following query will sort the data in this table and report its memory usage:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  ( SELECT  *
          FROM    sys.dm_exec_query_memory_grants
          WHERE   session_id = @@SPID
        FOR
          XML PATH('') ,
              TYPE
        ) Mem ,
        *
FROM    dbo.FixedWidth
ORDER BY c1;
</pre>
</div>
<p>
This query has below execution plan:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Query_with_realtime_sys.dm_exec_query_memory_grants_Access.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Query_with_realtime_sys.dm_exec_query_memory_grants_Access.png" alt="Query with realtime sys.dm exec query memory grants Access TOP N Sort   A Little Bit of Sorting" title="Query with real time access to sys.dm_exec_query_memory_grants" width="1284" height="285" class="aligncenter size-full wp-image-910" /></a>
</p>
<p>
The access to the <span class="tt">sys.dm_exec_query_memory_grants</span> DMV happens after the sort has finished. Therefore the <span class="tt">max_used_memory_kb</span> will tell us the amount actually used for the sort of the entire 2MB table. The information from the memory DMV is returned as a single XML value. As the DMV result is spooled in this query, all rows will show the same XML value:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;session_id&gt;66&lt;/session_id&gt;
&lt;request_id&gt;0&lt;/request_id&gt;
&lt;scheduler_id&gt;1&lt;/scheduler_id&gt;
&lt;dop&gt;1&lt;/dop&gt;
&lt;request_time&gt;2012-04-22T19:09:40.303&lt;/request_time&gt;
&lt;grant_time&gt;2012-04-22T19:09:40.307&lt;/grant_time&gt;
&lt;requested_memory_kb&gt;7352&lt;/requested_memory_kb&gt;
&lt;granted_memory_kb&gt;7352&lt;/granted_memory_kb&gt;
&lt;required_memory_kb&gt;512&lt;/required_memory_kb&gt;
&lt;used_memory_kb&gt;2456&lt;/used_memory_kb&gt;
&lt;max_used_memory_kb&gt;2456&lt;/max_used_memory_kb&gt;
&lt;query_cost&gt;3.460044653270174e+000&lt;/query_cost&gt;
&lt;timeout_sec&gt;86&lt;/timeout_sec&gt;
&lt;resource_semaphore_id&gt;0&lt;/resource_semaphore_id&gt;
&lt;plan_handle&gt;BgA6AB2ceA5AIZDfAAAAAAAAAAAAAAAA&lt;/plan_handle&gt;
&lt;sql_handle&gt;AgAAAB2ceA7MMroMHW//TQYN1cSbf5KW&lt;/sql_handle&gt;
&lt;group_id&gt;2&lt;/group_id&gt;
&lt;pool_id&gt;2&lt;/pool_id&gt;
&lt;is_small&gt;0&lt;/is_small&gt;
&lt;ideal_memory_kb&gt;7352&lt;/ideal_memory_kb&gt; 
</pre>
</div>
<p>
It shows that 7,352KB is the ideal memory grant size the optimizer came up with. This was also the amount that was granted. However, only 2,456KB where actually used during the sort.
</p>
<h3>TOP N Sort</h3>
<p>
If you are interested only in the first few rows based on a sort order, you do not need to complete sort the rest of the rows.
To find the top most row it is enough to take the first row of the data set and compare it to each remaining row, keeping each time the one row that comes first according to the sort order. If you have more than one row you can follow the same principle, you just have to keep the first n rows after each step. 
</p>
<p>
SQL Server implements this with the TOP N Sort operator:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT TOP ( 10 )
        *
FROM    dbo.FixedWidth
ORDER BY c1; 
</pre>
</div>
<p>
This simple query has the following execution plan:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/Query_with_TOP_N_Sort.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/04/Query_with_TOP_N_Sort.png" alt="Query with TOP N Sort TOP N Sort   A Little Bit of Sorting" title="Query with TOP N Sort" width="504" height="112" class="aligncenter size-full wp-image-911" /></a>
</p>
<p>
As you can see, the sort operator is of type "TOP N Sort".
</p>
<p>
You can use the same trick as shown above to get its memory consumption, but you have to add an outer layer, otherwise SQL Server removes the TOP N Sort optimization:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  ( SELECT  *
          FROM    sys.dm_exec_query_memory_grants
          WHERE   session_id = @@SPID
        FOR
          XML PATH('') ,
              TYPE
        ) Mem ,
        *
FROM    ( SELECT TOP ( 10 )
                  *
          FROM    dbo.FixedWidth
          ORDER BY c1
        ) X;
</pre>
</div>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;session_id&gt;66&lt;/session_id&gt;
&lt;request_id&gt;0&lt;/request_id&gt;
&lt;scheduler_id&gt;1&lt;/scheduler_id&gt;
&lt;dop&gt;1&lt;/dop&gt;
&lt;request_time&gt;2012-04-22T19:25:18.023&lt;/request_time&gt;
&lt;grant_time&gt;2012-04-22T19:25:18.023&lt;/grant_time&gt;
&lt;requested_memory_kb&gt;1024&lt;/requested_memory_kb&gt;
&lt;granted_memory_kb&gt;1024&lt;/granted_memory_kb&gt;
&lt;required_memory_kb&gt;24&lt;/required_memory_kb&gt;
&lt;used_memory_kb&gt;24&lt;/used_memory_kb&gt;
&lt;max_used_memory_kb&gt;24&lt;/max_used_memory_kb&gt;
&lt;query_cost&gt;8.388694532701750e-001&lt;/query_cost&gt;
&lt;timeout_sec&gt;25&lt;/timeout_sec&gt;
&lt;resource_semaphore_id&gt;1&lt;/resource_semaphore_id&gt;
&lt;plan_handle&gt;BgA6AM04yitAYc7jAAAAAAAAAAAAAAAA&lt;/plan_handle&gt;
&lt;sql_handle&gt;AgAAAM04yivnP5lrZVMtEy5tLK/6dHjX&lt;/sql_handle&gt;
&lt;group_id&gt;2&lt;/group_id&gt;
&lt;pool_id&gt;2&lt;/pool_id&gt;
&lt;is_small&gt;1&lt;/is_small&gt;
&lt;ideal_memory_kb&gt;32&lt;/ideal_memory_kb&gt; 
</pre>
</div>
<p>
This query used only 24KB, clearly not enough to hold the entire table, so the TOP N Sort optimization was used. 
</p>
<p>
The algorithms SQL Server uses for sorting are not documented. However, it is clear that an algorithm to find the TOP N rows cannot be as efficient as a normal sort when sorting the same number of rows. For that reason there is a maximum number of rows for which SQL Server uses the TOP N Sort optimization. That number was set to 100.
</p>
<p>
If you run above query with TOP(100) you will get a memory usage of 128KB:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;session_id&gt;66&lt;/session_id&gt;
&lt;request_id&gt;0&lt;/request_id&gt;
&lt;scheduler_id&gt;1&lt;/scheduler_id&gt;
&lt;dop&gt;1&lt;/dop&gt;
&lt;request_time&gt;2012-04-22T19:32:38.127&lt;/request_time&gt;
&lt;grant_time&gt;2012-04-22T19:32:38.127&lt;/grant_time&gt;
&lt;requested_memory_kb&gt;1024&lt;/requested_memory_kb&gt;
&lt;granted_memory_kb&gt;1024&lt;/granted_memory_kb&gt;
&lt;required_memory_kb&gt;128&lt;/required_memory_kb&gt;
&lt;used_memory_kb&gt;128&lt;/used_memory_kb&gt;
&lt;max_used_memory_kb&gt;128&lt;/max_used_memory_kb&gt;
&lt;query_cost&gt;8.624926532701749e-001&lt;/query_cost&gt;
&lt;timeout_sec&gt;25&lt;/timeout_sec&gt;
&lt;resource_semaphore_id&gt;1&lt;/resource_semaphore_id&gt;
&lt;plan_handle&gt;BgA6AKCsNDBAIVXkAAAAAAAAAAAAAAAA&lt;/plan_handle&gt;
&lt;sql_handle&gt;AgAAAKCsNDB6vVhQtTSg6/3xIflWWU1M&lt;/sql_handle&gt;
&lt;group_id&gt;2&lt;/group_id&gt;
&lt;pool_id&gt;2&lt;/pool_id&gt;
&lt;is_small&gt;1&lt;/is_small&gt;
&lt;ideal_memory_kb&gt;136&lt;/ideal_memory_kb&gt; 
</pre>
</div>
<p>
 If you use TOP(101) the memory usage goes up to 2,456KB:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;session_id&gt;66&lt;/session_id&gt;
&lt;request_id&gt;0&lt;/request_id&gt;
&lt;scheduler_id&gt;1&lt;/scheduler_id&gt;
&lt;dop&gt;1&lt;/dop&gt;
&lt;request_time&gt;2012-04-22T19:32:11.993&lt;/request_time&gt;
&lt;grant_time&gt;2012-04-22T19:32:11.993&lt;/grant_time&gt;
&lt;requested_memory_kb&gt;7352&lt;/requested_memory_kb&gt;
&lt;granted_memory_kb&gt;7352&lt;/granted_memory_kb&gt;
&lt;required_memory_kb&gt;512&lt;/required_memory_kb&gt;
&lt;used_memory_kb&gt;2456&lt;/used_memory_kb&gt;
&lt;max_used_memory_kb&gt;2456&lt;/max_used_memory_kb&gt;
&lt;query_cost&gt;8.627551332701751e-001&lt;/query_cost&gt;
&lt;timeout_sec&gt;25&lt;/timeout_sec&gt;
&lt;resource_semaphore_id&gt;0&lt;/resource_semaphore_id&gt;
&lt;plan_handle&gt;BgA6AMCfYwlA4WnkAAAAAAAAAAAAAAAA&lt;/plan_handle&gt;
&lt;sql_handle&gt;AgAAAMCfYwlb3oB4gW5K1oLplwLKxsA+&lt;/sql_handle&gt;
&lt;group_id&gt;2&lt;/group_id&gt;
&lt;pool_id&gt;2&lt;/pool_id&gt;
&lt;is_small&gt;0&lt;/is_small&gt;
&lt;ideal_memory_kb&gt;7352&lt;/ideal_memory_kb&gt; 
</pre>
</div>
<p>
All the memory numbers match the ones from the whole table sort, so SQL Server decided to execute a standard sort followed by a standard TOP operation instead of using the TOP N Sort optimization.
</p>
<p>
This boundary is fixed and cannot be configured. There is also no hint to force this optimization. That is rather unfortunate as 100 seems to be a quite arbitrary boundary value and a TOP N Sort would often perform a lot better that a full table sort followed by a TOP.
</p>
<h3>Conclusion</h3>
<p>
We all know that while the optimizer in general is very good a finding a (close to) optimal plan, sometimes it does not. 
</p>
<p>
The TOP N Sort optimization is a great enhancement, if you are dealing with 100 or less rows. However at 101 rows you will see a sudden drop in performance. To deal with this, you can either restrict the value of rows to return to 100, if the business allows. Or you can help SQL Server to deal with the "all rows" sort. One of the most obvious options here is to add an index to the table.
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/3uyw39SCuvs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/908/top-n-sort-a-little-bit-of-sorting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/908/top-n-sort-a-little-bit-of-sorting/</feedburner:origLink></item>
		<item>
		<title>LAG-ging Behind [T-SQL Tuesday #029 - Let's have a SQL Server 2012 party]</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/cBql7CzNlgg/</link>
		<comments>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 14:00:57 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=879</guid>
		<description><![CDATA[This article introduces and explains the new LAG function in SQL Server. It also shows how to use it for SQL Server wait time analysis.]]></description>
			<content:encoded><![CDATA[<div>
<p>
<a href="http://www.nigelpsammy.com/2012/04/t-sql-tuesday-029-lets-have-sql-server.html">
<img height="132" border="0" hspace="9" width="131" alt="SqlTuesday LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="T-SQL Tuesday #29" src="http://images.sqlity.net/SqlTuesday.png" />
</a>
</p>
<p>T-SQL Tuesday #29 is hosted by Nigel P Sammy (<a href="http://www.nigelpsammy.com/">blog</a>|<a href="https://twitter.com/#!/NigelSammy">twitter</a>). 
This month's topic is "<a href="http://www.nigelpsammy.com/2012/04/t-sql-tuesday-029-lets-have-sql-server.html">Let's have a SQL Server 2012 party</a>".</p>

<h1>Calculating incrementals with the LAG function</h1>
<h3>Introduction</h3>
<p>
With the title of this post I was not trying to imply that I am late in discovering SQL Server 2012 features. 
Instead I was thinking of one small but very useful new feature that allows combining data from several result set rows into a new row. It is the new LAG function. 
</p>	
<p>
 The LAG function is best explained with an example:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT n
INTO #t1
FROM(VALUES(1),(2),(3),(4))X(n);

SELECT n, 
LAG(n,1)OVER(ORDER BY n) [LAG(n,1)], 
LAG(n,2)OVER(ORDER BY n) [LAG(n,2)]
FROM #t1;
</pre>
</div>
<p>
  This returns the result shown below:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Result.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Result.png" alt="LAG Example 1 Result LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Result" width="177" height="118" class="aligncenter size-full wp-image-901" />
</a>
</p>
<p>
  The LAG function takes 2 parameters. The first is the name of the column you are looking for. The second parameter specifies how many rows you want to go back.
</p>
<p> 
  Lag falls into the category of the window functions, so you also need to specify the order in which to process the rows. This is done with the <span class="tt">OVER(ORDER BY n)</span> clause.
</p>
<p>
As above result shows, <span class="tt">LAG(n, 1)</span> returns the value the column n had in the previous row. Similarly, <span class="tt">LAG(n, 2)</span> returns the value the n had two rows ago.
</p>
<p>
The first parameter can actually be any expression. It gets evaluated in the context of the actual row that is specified by the second parameter. There is also a third parameter that specifies the default value that is returned when the requested row does not exist. When the actual value in the context of the requested row is NULL, NULL is returned even if a default was specified.
</p>
<h3>Window Spool</h3>
<p>
  In SQL Server versions before SQL 2012, to achieve this behavior you had to do a self-join for each row to go back to. To mimic the above query you have to write something like this: 
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
WITH A AS
(
 SELECT n, ROW_NUMBER() OVER(ORDER BY n) rn
 FROM #t1
)
SELECT A0.n, A1.n [LAG(n,1)], A2.n [LAG(n,2)]
FROM A A0
LEFT JOIN A A1 ON A0.rn = A1.rn + 1
LEFT JOIN A A2 ON A0.rn = A2.rn + 2;
</pre>
</div>
<p>
This does not only look more complex, it also is a lot more work for SQL Server. Below is the execution plan for above JOIN query:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Joins.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Joins.png" alt="LAG Example 1 Execution Plan with Joins LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Execution Plan (with joins)" width="932" height="242" class="aligncenter size-full wp-image-902" />
</a>
</p>
<p>
SQL Server has to scan and sort the table three times. It then joins the three streams together using two Hash Join operators. All of these are fairly expensive operations.
</p>
<p>
The execution plan of the LAG query on the other hand looks like this:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Window_Spool_for_LAG.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Window_Spool_for_LAG.png" alt="LAG Example 1 Execution Plan with Window Spool for LAG LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Execution Plan (with Window Spool Operators)" width="1594" height="77" class="aligncenter size-full wp-image-903" />
</a>
</p>
<p>
In this query SQL Server utilizes the new Window Spool operator. Each step size used in a LAG function requires its own Window Spool. Using <span class="tt">LAG(n,1)</span> and <span class="tt">LAG(m,1)</span> with two expressions n and m in the same query requires only one Window Spool operator. Using two different step sizes as in above example requires two Window Spool operators.
</p>
<p>
The Window Spool operator remembers the value of the expression that was specified in the first parameter of the LAG function for the last few rows &ndash; just enough rows to satisfy the LAG requirement.
</p>
<p>
Remembering a few values is clearly a lot less Work, than scanning and sorting the entire table again.
</p>
<h3>How long did you wait?</h3>
<p>
A great use case for this functionality is the analysis of wait times in SQL Server. The <span class="tt">sys.dm_os_wait_stats</span> DMV provides accumulated statistics of the time SQL Server spend waiting since its last restart,
broken down by wait type.
</p>
<p>
A good way to make sense of this information is to capture the wait stats in regular intervals and calculate the delta for each interval. This is now easy with the LAG function:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.WaitStatLog') IS NOT NULL 
  DROP TABLE dbo.WaitStatLog;

CREATE TABLE dbo.WaitStatLog
  (
    Id INT NOT NULL
           IDENTITY(1, 1)
           CONSTRAINT WaitStatLog_PK PRIMARY KEY CLUSTERED ,
    CaptDTime DATETIME2
      NOT NULL
      CONSTRAINT WaitStatLog_CaptDTime_Dflt DEFAULT SYSDATETIME()
  );

IF OBJECT_ID('dbo.WaitStatLogDtl') IS NOT NULL 
  DROP TABLE dbo.WaitStatLogDtl;

CREATE TABLE dbo.WaitStatLogDtl
  (
    WaitStatLogId INT NOT NULL ,
    wait_type NVARCHAR(60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    waiting_tasks_count BIGINT NOT NULL ,
    wait_time_ms BIGINT NOT NULL ,
    max_wait_time_ms BIGINT NOT NULL ,
    signal_wait_time_ms BIGINT NOT NULL ,
    CONSTRAINT WaitStatLogDtl_PK PRIMARY KEY CLUSTERED
      ( WaitStatLogId, wait_type )
  );

IF OBJECT_ID('dbo.CaptureWaitStats') IS NOT NULL 
  DROP PROCEDURE dbo.CaptureWaitStats;
GO
CREATE PROCEDURE dbo.CaptureWaitStats
AS
BEGIN  
  DECLARE @Id TABLE(Id INT);
  INSERT INTO dbo.WaitStatLog OUTPUT(INSERTED.Id) INTO @Id(Id) DEFAULT VALUES;
  INSERT INTO dbo.WaitStatLogDtl
  SELECT (SELECT Id FROM @Id), *
  FROM sys.dm_os_wait_stats;
END
GO
</pre>
</div>
<p>
The above SQL script creates two tables and a procedure. The procedure (<span class="tt">dbo.CaptureWaitStats</span>) captures the current values from the <span class="tt">sys.dm_os_wait_stats</span> DMV into the two tables. You can either call it manually or setup a job to execute it in regular intervals.
</p>
<p>
After a few values have been collected you can use the following query to calculate the incremental values:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
WITH WaitDelta AS
(
SELECT  L.Id ,
        L.CaptDTime ,
        DATEDIFF(millisecond,LAG(L.CaptDTime, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ),L.CaptDTime) Interval,
        D.wait_type ,
        D.max_wait_time_ms ,
        D.waiting_tasks_count - LAG(D.waiting_tasks_count, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) waiting_tasks_count,
        D.wait_time_ms - LAG(D.wait_time_ms, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) wait_time_ms,
        D.signal_wait_time_ms - LAG(D.signal_wait_time_ms, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) signal_wait_time_ms
FROM    dbo.WaitStatLog L
        JOIN dbo.WaitStatLogDtl D ON L.Id = D.WaitStatLogId
),
WaitTop AS
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Id ORDER BY wait_time_ms DESC) rn
FROM WaitDelta
WHERE waiting_tasks_count &gt; 0
AND wait_type NOT IN (
        'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
        'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
        'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
        'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
        'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
        'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP','SLEEP_BPOOL_FLUSH',
        'CXPACKET','DBMIRROR_EVENTS_QUEUE','DBMIRRORING_CMD')
)
SELECT *
FROM WaitTop
WHERE rn&lt;6
ORDER BY Id DESC;
</pre>
</div>
<p>
The query uses the LAG function to calculate the delta values. The <span class="tt">PARTITION BY wait_type</span> clause makes sure that only values of the same wait type are used to calculate a delta. Because all LAG invocations have the same partitioning and sort order and use the same step size, only a single Window Spool is required by this query.  
</p>
<p>
The query returns only the top five wait types for each interval; this is done using the ROW_NUMBER function. It also filters out a few wait types that happen frequently in a healthy SQL Server installation (see <a href="http://www.sqlskills.com/BLOGS/PAUL/post/Wait-statistics-or-please-tell-me-where-it-hurts.aspx" target="_blank">Wait statistics, or please tell me where it hurts</a> for one source of this exclusion list).
</p>
<h3>Conclusion</h3>
<p>
SQL Server's new LAG function provides an easy way to calculate incrementals. It makes the actual coding simpler and also significantly reduces the amount of work that SQL Server itself has to do, compared to the old JOIN method.
</p>
<p>
This is a great tool for analyzing SQL Server performance data that tends to be accessible only in accumulated values.
</p>
<p>
There is also a new LEAD function available that is basically doing the same thing but allows to access values from rows ahead instead of behind.
</p>
</div>
<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/cBql7CzNlgg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/</feedburner:origLink></item>
		<item>
		<title>2012-04-12: Index Internals – SQL PASS Regionalgruppe Hamburg</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/oO8zwRSbPzo/</link>
		<comments>http://sqlity.net/en/872/2012-04-12-index-internals-sql-pass-regionalgruppe-hamburg/#comments</comments>
		<pubDate>Mon, 09 Apr 2012 10:35:43 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=872</guid>
		<description><![CDATA[TitleIndex Internals - How SQL Server stores and accesses data When12.April.2012 um 18:00 Where 70. Treffen der SQL PASS Regionalgruppe Hamburg Gasstraße 6 22761 Hamburg Microsoft Deutschland Geschäftsstelle Hamburg]]></description>
			<content:encoded><![CDATA[<style type="text/css"><!-- .WhenAndWhere td {padding:5px; border:1px solid #000;border-radius:3px;} --></style>
<table  class="WhenAndWhere">
<tr>
<td>Title</td><td><a href="http://sqlity.net/en/592/presentation-materials-index-internals/">Index Internals - How SQL Server stores and accesses data</a></td>
</tr>
<tr>
<td>When</td><td>12.April.2012 um 18:00</td>
</tr><tr>
<td>Where</td><td>
<div>
<a href="http://www.sqlpass.de/Regionen/Deutschland/Hamburg.aspx">70. Treffen der SQL PASS Regionalgruppe Hamburg</a>
<br />Gasstraße 6 22761 Hamburg
<br />Microsoft Deutschland Geschäftsstelle Hamburg
</div>
</td>
</tr>
</table><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/oO8zwRSbPzo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/872/2012-04-12-index-internals-sql-pass-regionalgruppe-hamburg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/872/2012-04-12-index-internals-sql-pass-regionalgruppe-hamburg/</feedburner:origLink></item>
		<item>
		<title>How to Calculate MAXINT</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/h2o4SZ27x8c/</link>
		<comments>http://sqlity.net/en/858/how-to-calculate-maxint/#comments</comments>
		<pubDate>Sun, 08 Apr 2012 00:28:51 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=858</guid>
		<description><![CDATA[Trying to directly calculate MAXINT for the BIGINT data type in SQL Server will either cause an arithmetic overflow error or produce inaccurate values. This article explains how to calculate this value accurately.]]></description>
			<content:encoded><![CDATA[<div>
<h3>Introduction</h3>
<p>
MAXINT or INT_MAX is the highest number that can be represented by a given integer data type. In SQL Server this number for the INT data type is 2,147,483,647. The highest number you can store using the BIGINT data type is 9,223,372,036,854,775,807.
</p>
<p>
The question I would like to look at today is the following: How can I calculate MAXINT without utilizing another datatype.
</p>
<h3>Signed Numbers</h3>
<p>
A signed integer allows to store positive and negative numbers, while an unsigned integer can only represent positive numbers.
All signed integer data types use a single bit to indicate if the number is positive or negative. In a 32 bit integer like SQL Servers INT data type 31 bits are left to encode the actual number. That means we can store 2^31 different numbers. With zero being the smallest non negative number, the largest number for the INT data type is 2^31-1 = 2,147,483,647.
</p>
<h3>Calculations</h3>
<p>
The trouble with this formula is, that when trying to directly calculate this number, you will have an intermediate result that is out of the range of the data type. 
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT POWER(2,31)-1;
</pre>
</div>
<p>
This SELECT statement will cause an arithmetic overflow error:
</p>
<p>
<pre>
Msg 232, Level 16, State 3, Line 1
Arithmetic overflow error for type int, value = 2147483648.000000.
</pre>
</p>
<p>
One of the ways to deal with that is to use a data type that can store larger numbers like the float datatype:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT POWER(2.,31)-1
</pre>
</div>
<p>
That works for the INT datatype. However, as there is no numerical datatype in SQL Server that uses more than 64 bits to represent a value, any attempt to do this for BIGINT will result in an inaccurate estimation of the real value:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT POWER(2.,63)-1
</pre>
</div>
<p>
This results in 9,223,372,036,854,775,799 which is 8 under the accurate value of 9,223,372,036,854,775,807.
</p>
<p>
This raises the question I mentioned earlier: Is there a way to calculate MAXINT without using a different data type? 
</p>
<h3>Binary Representations</h3>
<p>
From here on I will use MAXINT to represent the highest number for the SQL Server INT datatype and MAXBIGINT to do the same for the SQL Server BIGINT data type.
</p>
<p>
If you display MAXINT as binary number it looks like this
</p>
<p>
<pre>1111111111111111111111111111111</pre>
</p>
<p>
Each digit in a binary number represents a specific power of two. The right most digit in above number stands for 2^0 and the left most for 2^31. To calculate the decimal value for a binary number you just add the "power of two"-values for each digit multiplied with the value of that digit. So binary 1011 would be 1*2^3 + 0*2^2 = 1*2^1 + 1*2^0 = 8+0+2=1 = 11. That means, to get to MAXINT, you can just add all 31 powers of two from 2^0 to 2^30:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT SUM(POWER(2,n-1)) FROM dbo.GetNums(31);
</pre>
</div>
<p>
The code uses <a href="http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers" title="Get GetNUms here!" target="_blank">Itzik Ben-Gan's GetNums function</a> to produce a list of all numbers from 1 to 31. That allows us to calculate the "power of 2"-values from 2^0 to 2^30 and add them all together using the SUM aggregate function.
</p>
<p>
The above solution works for MAXINT. It also works for MAXBIGINT if you first CAST the literal 2 to BIGINT. However, this method of calculation seems overly complicated.
</p>
<h3>Two's Complement</h3>
<p>
So far we have only looked at positive numbers. We said already that negative numbers are represented by using a single bit as an indicator. This is done with the <a href="http://en.wikipedia.org/wiki/Two's_complement" title="Two's Complement on Wikipedia" target="_blank">Two's Complement</a> encoding. I will not go into more detail about this encoding here, but it being used means that we can represent as many different negative as non negative numbers. As zero is not a negative number the possible values for negative numbers range from -(2^0) to -(2^31) for INT and -(2^0) to -(2^63) for BIGINT. So while we can't represent 2^63, we can represent -(2^63) with the BIGINT data type. 
</p>
<p>
To calculate -(2^63) we can use the fact that an uneven power of a negative number is again negative. That means that -(2^63) = (-2)^63. To get to MAXBIGINT we now just need to subtract this value from -1: 
</p>
<p>
<pre>
-1 - (-2)^63 = -1 - (-(2^63)) = -1 + 2^63
</pre>
</p>
<p>
In T-SQL the calculation looks like this for MAXINT:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT -1-POWER(-2,31);
</pre>
</div>
<p>
For MAXBIGINT we again need to first cast the literal 2 to BIGINT to tell SQL Server that this is the data type we would like to work in:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT -1-POWER(CAST(-2 AS BIGINT),63);
</pre>
</div>
<p>
In above calculations it is important to keep the order of the steps. Writing -POWER(-2,31)-1, while mathematically equivalent will cause SQL Server to first calculate 2^31 which will instantly cause the dreaded Arithmetic Overflow error.
</p>
<h3>Conclusion</h3>
<p>
While there is no way to directly calculate MAXBIGINT in SQL Server due to data type limitations, exploiting the fact that the range for integer data types for numbers smaller that zero is one bigger than the range for numbers greater than zero allows us to calculate this value within the ranges of the BIGINT data type.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/h2o4SZ27x8c" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/858/how-to-calculate-maxint/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/858/how-to-calculate-maxint/</feedburner:origLink></item>
		<item>
		<title>Optimizer, what if I had more CPUs?</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/BmIQPl8tR3A/</link>
		<comments>http://sqlity.net/en/828/optimizer-what-if-i-had-more-cpus/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 14:00:45 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=828</guid>
		<description><![CDATA[In the March issue of SQL Server Pro Itzik Ben-Gan wrote about an undocumented DBCC command that was recently discovered in the wild by Eladio Rincon (http://www.sqlmag.com/article/sql-server/options-affecting-parallelism-141505). Itzik closed the article by expressing his hope to get this statement documented as it can be very helpfull to database professionals trying to performance tune a query <a href="http://sqlity.net/en/828/optimizer-what-if-i-had-more-cpus/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
In the March issue of SQL Server Pro Itzik Ben-Gan wrote about an undocumented DBCC command that was recently discovered in the wild by Eladio Rincon (<a href="http://www.sqlmag.com/article/sql-server/options-affecting-parallelism-141505" target="_blank">http://www.sqlmag.com/article/sql-server/options-affecting-parallelism-141505</a>). 
Itzik closed the article by expressing his hope to get this statement documented as it can be very helpfull to database professionals trying to performance tune
a query on a none production system, that does not completely match the production environment.
</p>
<p>
I agree with this opinion, and so I decided to spend some time investigating this functionality further.
<p>
<h3>Optimizer_WhatIf?</h3>
<p>
The DBCC command is the OPTIMIZER_WHATIF cammand. DBCC HELP gives you the following information about it:
</p>
<p>
<pre>
dbcc OPTIMIZER_WHATIF ({property/cost_number | property_name} [, {integer_value | string_value} ])
</pre>
</p>
<p>
This is &ndash; according to Google &ndash; the only publicly available information about this command. Eladio discovered it in use at a customer site. 
The customer was told by Microsoft that with the first parameter equal to one, the second would influence parallelism decisions of the optimizer. 
</p>
<p>
After reading Itziks article it seemed clear that the "1" meant "number of schedulers" or "number of CPUs". 
The above documentation snippet says that you can either specify a number or a name for the first parameter, so knowing the possible non-numeric values
for it would be a start in understanding what else this command does.
</p>
<h3>Finding Strings</h3>
<p>
Every windows programm has to somehow contain the strings it is using. SQL Server is no exception. A well designed program will have related strings close by each other.
</p>
<p>
You can use a program like <a href="http://technet.microsoft.com/en-us/sysinternals/bb896653" target="_blank">Mark Russinovich's Process Explorer</a>
to get a list of all strings in the memory space of a running process. 
</p>
<p>
As all statements (including DBCC commands) are strings, you can expect to find the possible values for the first parameter next to each other. With that thought in mind I went ahead and 
searched for strings containing "CPU" or "scheduler". I found this list of possible property_name values for SQL Server 2008 and SQL Server 2008 R2:
<pre>
Status
ResetAll
CPUs
MemoryMBs
Bits
ParallelCardThreshold
ParallelCostThreshold
ParallelCardCrossProd
</pre>
In SQL Server 2012 these properties were added.
<pre>
LowCEThresholdFactorBy10
HighCEThresholdFactorBy10
CEThresholdFactorCrossover
DMLChangeThreshold
</pre>
DBCC OPTIMIZER_WHATIF was available in SQL Server 2005 as well, but "ResetAll" and "Status" are not working, so I did not investigate further.
</p>
<h5>Warning</h5>
<p>
All property names are case sensitive, even in a case insensitive installation! You also will not get an error message if an invalid name or value is passed in, so be careful.
</p>
<h3>Many More CPUs</h3>
<p>
The third one, "CPUs" causes the functionality described by Itzik: If you run 
<pre>DBCC OPTIMIZER_WHATIF(CPUs, 16) WITH NO_INFOMSGS;</pre>
the optimizer will subsequently optimizer all statements submitted in the same session as if the system had 16 CPUs.
</p>
<h3>Administrative Help</h3>
<p>
If you run
<pre>DBCC OPTIMIZER_WHATIF(ResetAll) WITH NO_INFOMSGS;</pre>
all changes caused by this DBCC command to the current session are undone.
</p>
<p>
<pre>DBCC OPTIMIZER_WHATIF(Status) WITH NO_INFOMSGS;</pre>
finally gives you the currently set values and their defaults. It also returns the integer representation of each property:
<pre>
---------------------------------------------------------

Optimizer what-if status

---------------------------------------------------------

property_number current_value default_value property_name

---------------------------------------------------------

         1                  0             0 CPUs

         2                  0             0 MemoryMBs

         3                  0             0 Bits

         4               1000          1000 ParallelCardThreshold

         5                  1             1 ParallelCostThreshold

         6                200           200 ParallelCardCrossProd

         7                 50            50 LowCEThresholdFactorBy10

         8                 12            12 HighCEThresholdFactorBy10

         9             100000        100000 CEThresholdFactorCrossover

        10                 10            10 DMLChangeThreshold
</pre>
(As with all undocumented DBCC commands, you first need to run "DBCC TRACEON (3604) WITH NO_INFOMSGS;" in your session to get any output.)
</p>
<h3>Conclusion</h3>
<p>
This DBCC command is a great tool, when you are trying to answer questions like: What if the optimizer had more (or less) CPUs available than there are in my current system.
</p>
<p>
As with all undocumented functionality however, don't use this in production unless advised to do so by Microsoft support.
</p>
<p>
This post covered only three of the possible values for the first parameter. The other ones will be addressed in future posts.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/BmIQPl8tR3A" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/828/optimizer-what-if-i-had-more-cpus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/828/optimizer-what-if-i-had-more-cpus/</feedburner:origLink></item>
		<item>
		<title>Key Lookup Operator in Update Statements</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/2WGD2PdqnlA/</link>
		<comments>http://sqlity.net/en/810/key-lookup-operator-in-update-statements/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 14:00:39 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=810</guid>
		<description><![CDATA[When thinking about the indexes for update statements it is easy to overlook an important peculiarity in the way SQL Server finds the rows to update before updating them. Those pesky Lookups If you have an UPDATE statement that filters on two columns and you have an index on one of them, you might be <a href="http://sqlity.net/en/810/key-lookup-operator-in-update-statements/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
When thinking about the indexes for update statements it is easy to overlook an important peculiarity in the way SQL Server finds the rows to update before updating them.
</p>
<h3>Those pesky Lookups</h3>
<p>
If you have an UPDATE statement that filters on two columns and you have an index on one of them, you might be tempted to think that the necessary lookup for the second column is not going to cost a lot - particular when that second column is not selective at all - as SQL Server has to access the storage place of each row anyway when executing the actual update. This however is not true, as SQL Server executes the update in two separate phases: One to identify the rows and one to update those rows. That means a lookup in the find phase can be very expensive.
</p>
<p>
Let us look at an example. Use the following script to create two identical tables and insert 200,000 rows into each of them.
</p>
<div>
<pre class="brush: sql; title: ; notranslate">

CREATE TABLE dbo.tst1(
 Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 Val INT,
 Filter INT,
 IdxKey INT,
 Fill CHAR(1000) DEFAULT 'Fill'
);

CREATE TABLE dbo.tst2(
 Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 Val INT,
 Filter INT,
 IdxKey INT,
 Fill CHAR(1000) DEFAULT 'Fill'
);

INSERT INTO dbo.tst1(Val,Filter,IdxKey)
OUTPUT INSERTED.Val,INSERTED.Filter,INSERTED.IdxKey INTO dbo.tst2(Val,Filter,IdxKey)
SELECT 0,0, n%1000
FROM dbo.GetNums(200000);
</pre>
</div>
<p>
The Filter column will be used to force the lookup later on by adding "Filter = 0" to the where clause. It is valued 0 in all rows, so no row will be excluded by this filter.
</p>
<p>
After creating the tables, create the following indexes:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
CREATE INDEX dbo_tst1_IdxKey ON dbo.tst1(IdxKey);
CREATE INDEX dbo_tst2_IdxKey ON dbo.tst2(IdxKey) INCLUDE (Filter);
</pre>
</div>
<p>
Both indexes are on the IdxKey column, but the index on the dbo.tst2 table also includes the Filter column.
</p>
<p>
Now run the following two identical update statements against the two tables:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
UPDATE dbo.tst1
  SET Val = 17
WHERE IdxKey &lt;30 AND Filter = 0;

UPDATE dbo.tst2
  SET Val = 17
WHERE IdxKey &lt;30 AND Filter = 0;
</pre>
</div>
<p>
The first one requires a lookup operator as the Filter column is not included in the index on dbo.tst1. Below are the two execution plans:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/03/update_with_bookmark_lookup.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/03/update_with_bookmark_lookup.png" alt="update with bookmark lookup Key Lookup Operator in Update Statements" title="Update with Bookmark Lookup" width="1451" height="275" class="aligncenter size-full wp-image-813" /></a>
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/03/update_without_bookmark_lookup.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/03/update_without_bookmark_lookup.png" alt="update without bookmark lookup Key Lookup Operator in Update Statements" title="Update without Bookmark Lookup" width="1210" height="131" class="aligncenter size-full wp-image-812" /></a>
</p>
<p>
As you can clearly see, the update operator is separated from the data retrieval operator(s) by several other operators. It looks like all the filtering and value preparations happen before the update operator which is then just updating all rows passed to it without any further filtering.
</p>
<h3>Statistics</h3>
<p>
I ran the above statements in a loop, each executing 2000 times and recorded time and logical reads. The averaged results are below:
</p>
<p>
<table>
  <thead>
    <tr>
      <th class="cpu_time-cell">cpu_time</th>
      <th class="total_elapsed_time-cell">total_elapsed_time</th>
      <th class="logical_reads-cell">logical_reads</th>
      <th class="exec_count-cell">exec_count</th>
      <th class="Cmd-cell">command</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="cpu_time-cell">98</td>
      <td class="total_elapsed_time-cell">175</td>
      <td class="logical_reads-cell">37212</td>
      <td class="exec_count-cell">2000</td>
      <td class="Cmd-cell">  UPDATE dbo.tst1    SET Val = 17  WHERE IdxKey <30 AND Filter = 0;  </td>
    </tr>
    <tr class="lastRow">
      <td class="cpu_time-cell">47</td>
      <td class="total_elapsed_time-cell">72</td>
      <td class="logical_reads-cell">19211</td>
      <td class="exec_count-cell">2000</td>
      <td class="Cmd-cell">  UPDATE dbo.tst2    SET Val = 17  WHERE IdxKey <30 AND Filter = 0;  </td>
    </tr>
  </tbody>
</table>
</p>
<p>
The cpu time, the elapsed time and the logical reads were each about twice as high in the query against dbo.tst1, the one requiring the lookup operator. This was the case even though the lookup did not actually filter any rows - every row accessed by the lookup had to be updated as well.
</p>
<h3>Conclusion</h3>
<p>
The above statistics prove what the execution plan already suggested: The finding of rows for an update and the update itself are separate steps in SQL Server update queries. That means the "find" portion has to be optimized as if it were a standalone SELECT. In particular, lookup operations can not be ignored as SQL Server accesses the rows twice in this case: Once for the lookup and once for the update.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/2WGD2PdqnlA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/810/key-lookup-operator-in-update-statements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/810/key-lookup-operator-in-update-statements/</feedburner:origLink></item>
	</channel>
</rss>

