<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>sqlity.net</title>
	
	<link>http://sqlity.net/en</link>
	<description>Quality for SQL</description>
	<lastBuildDate>Sun, 19 Feb 2012 16:06:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/sqlitynet" /><feedburner:info uri="sqlitynet" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>Deferred Deallocations of Pages in TempDb</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/cb8IerYMilc/</link>
		<comments>http://sqlity.net/en/671/deferred-deallocations-of-pages-in-tempdb/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 15:00:53 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[SQL Server Internals]]></category>
		<category><![CDATA[TempDb]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=671</guid>
		<description><![CDATA[The other day I ran into a question on msdn:Tempdb Deallocation (dm_db_session_space_usage). A person with the pseudonym DontPageMeBro noticed an odd behavior in his SQL Server installation: The sum of the reported usage of tempdb across all sessions was often bigger than tempdb itself. Something is clearly off here, so I figured it was worth <a href="http://sqlity.net/en/671/deferred-deallocations-of-pages-in-tempdb/#more-671'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
The other day I ran into a question on msdn:<a href="http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/af168419-5ad9-44eb-b377-fb5560c76a1b" target="_blank">Tempdb Deallocation (dm_db_session_space_usage)</a>. A person with the pseudonym DontPageMeBro noticed an odd behavior in his SQL Server installation: The sum of the reported usage of tempdb across all sessions was often bigger than tempdb itself. Something is clearly off here, so I figured it was worth digging a little deeper. 
</p>
<h3>TempDb Usage</h3>
<p>
First let us look at how to get information about tempdb usage. SQL Server provides two DMVs that return information about the space used in tempdb:
<pre>sys.dm_db_session_space_usage</pre> and <pre>sys.dm_db_task_space_usage</pre></p>
<p>
sys.dm_db_task_space_usage reports the amount of pages taken and released by a task currently running. If a request uses parallelism you might see more than one row for that request returned by this DMV.
After each task finishes, it's numbers get rolled up into the sys.dm_db_session_space_usage. So here you can see how much space in tempdb a request used in total, but this information is only available after the request is completed.
</p>
<p>
Both DMVs report the number of pages in four columns: 
<pre>user_objects_alloc_page_count, 
user_objects_dealloc_page_count, 
internal_objects_alloc_page_count, 
internal_objects_dealloc_page_count</pre>
The two "user" columns report on spaced used by user created objects like #temptables. The two "internal" columns report on objects SQL Server decided to create to support the execution of a query. Both pair are made up of a "alloc" and a "dealloc" page count column. The former is the total count of all pages that where allocated in the scope of the DMV whereas the latter contains the total count of all deallocated pages.
</p>
<p>
So, to get the current user object space allocation in tempdb for a given session, you need to subtract the "user_objects_dealloc_page_count" from the "user_objects_alloc_page_count" in both DMVs and then SUM all those values together. The following query shows how to do that for user as well as for internal space used:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT  session_id ,
        SUM(user_objects_alloc_page_count - user_objects_dealloc_page_count) CurrentUserPages,
        SUM(internal_objects_alloc_page_count - internal_objects_dealloc_page_count) CurrentInternalPages
FROM    ( SELECT  session_id ,
                  user_objects_alloc_page_count ,
                  user_objects_dealloc_page_count ,
                  internal_objects_alloc_page_count ,
                  internal_objects_dealloc_page_count
          FROM    sys.dm_db_session_space_usage
          UNION ALL
          SELECT  session_id ,
                  user_objects_alloc_page_count ,
                  user_objects_dealloc_page_count ,
                  internal_objects_alloc_page_count ,
                  internal_objects_dealloc_page_count
          FROM    sys.dm_db_task_space_usage
        ) X
GROUP BY X.session_id
ORDER BY X.session_id ASC ;
</pre>
</div>
<p>
DontPageMeBro is dealing with the problem that some users create large #temptables and then leave the session open without dropping those tables. If you have a lot of users like that your tempdb might grow to a significant size. Because it is always advisable to have tempdb on very fast storage, this problem can mean a significant cost to the business, so it makes sense to follow up and see how big the problem really is. The sys.dm_db_session_space_usage can give us that information. Every time there is a session that has more allocated than deallocated pages, that session is holding on to a temporary object. If the last request for that session has finished a while ago, that means, that space is not actually in use and those objects should have been dropped.
</p>
<h3>Counting Inaccuracies</h3>
<p>
So far the theory. However, as DontPageMeBro noticed, if a #temptable is big it seems that SQL Server sometimes forgets to count the page deallocation. So, even though the #temptable is gone, the pages are still being counted as in use by the session that created the table.
</p>
<p>
To understand what is going on here, we need to look at how SQL Server handles page deallocations after dropping an object. 
</p>
<p>
Each Table consists of at least one allocation_unit. An allocation_unit contains for example all the data pages for a given table or index. There might also be additional allocation_units holding special pages for e.g. row overflow data. Every index in turn has it's own set of allocation_units.
</p>
<p>
When dropping an object, all allocation_units belonging to that object are deallocated and all their pages are marked as free. This can take quite some time for big objects and usually this cleanup work is done synchronously, which means that the session issuing the drop request has to wait. However if the allocation unit contains more than 1024 pages, this cleanup work is deferred. The allocation_unit is marked as "DROPPED" and the link to the table is removed at the time the drop statement executes. The actual work of freeing up all those pages is then done by a cleanup process later on. This process is the same in all databases, not only tempdb.
</p>
<p>
Lets look at an example to demonstrate this:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE #temp
  (
    id INT IDENTITY ,
    name VARCHAR(100) 
  );
GO
SELECT  allocation_unit_id
INTO    #aus
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN tempdb.sys.system_internals_partitions p ON u.container_id = p.partition_id
WHERE   p.object_id = OBJECT_ID('tempdb..#temp');
GO
INSERT  INTO #temp
        SELECT TOP ( 328168 )
                'FillText'
        FROM    sys.system_internals_partition_columns a ,
                sys.system_internals_partition_columns;
GO
SELECT  'A' A , *
FROM    sys.dm_db_session_space_usage
WHERE   session_id = @@SPID;

SELECT  'A' A , u.* , p.rows
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN tempdb.sys.system_internals_partitions p ON u.container_id = p.partition_id
        JOIN #aus t ON u.allocation_unit_id = t.allocation_unit_id;
GO
DROP TABLE #temp;
GO
SELECT  'B' B , *
FROM    sys.dm_db_session_space_usage
WHERE   session_id = @@SPID;

SELECT  'B' B , u.*
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN #aus t ON u.allocation_unit_id = t.allocation_unit_id;
IF ( @@ROWCOUNT &gt; 0 ) 
  BEGIN
    WAITFOR DELAY '00:00:06' ;
    SELECT  'C' C , *
    FROM    sys.dm_db_session_space_usage
    WHERE   session_id = @@SPID;
    
    SELECT  'C' C , *
    FROM    tempdb.sys.system_internals_allocation_units u
            JOIN #aus t ON u.allocation_unit_id = t.allocation_unit_id;
  END ;
GO

DROP TABLE #aus;
</pre>
</div>
<p>
This script creates the table #temp, fills it with a few thousand rows and then drops it again. If you run this script you will get the following output back:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/02/direct_AU_drop.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/direct_AU_drop.png" alt="direct AU drop Deferred Deallocations of Pages in TempDb " title="direct_AU_drop" width="1088" height="255" class="alignleft size-full wp-image-676" /></a>
</p>
<p>
The first result-set shows that the current session has 1019 pages currently reserved in tempdb. The second result-set tells us that the table #temp contains 328168 rows that are stored in a single allocation unit of type "IN_ROW_DATA" with 1018 pages.
</p>
<p>
The third and forth result-set were captured after the table was dropped. Here we can see that 1017 pages were deallocated in tempdb for the current connection and that the allocation_unit is now gone. (That not all pages of #temp show as being dropped in the count is due to another optimization that is not part of today's discussion.)
</p>
<p>
If we change the number of rows inserted (line 14 in above script) from 328168 to 328169 we get this output instead:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/02/deferred_AU_drop.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/deferred_AU_drop.png" alt="deferred AU drop Deferred Deallocations of Pages in TempDb " title="deferred_AU_drop" width="1089" height="303" class="alignleft size-full wp-image-677" /></a>
</p>
<p>
This time the allocation_unit has 1026 pages, so we would expect the deferred drop to take place. Result-set 4 now shows, directly after the drop table took place, that the allocation_unit is still there, but it's type changed to "DROPPED". The link to the table it once belonged to is also removed, as is visible in the container_id column.
</p>
<p>
If the above script sees that the allocation_unit still exists after the drop (see line 37) it waits for a few seconds and then looks again. In result-set 6 you can now see that the allocation_unit got removed. However the current session still shows 1027 allocated and no deallocated pages (result-set 5).
</p>
<h3>Cleaning Up</h3>
<p>
To see the cleanup process in action you can use the following script:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE #temp
  (
    id INT IDENTITY ,
    name VARCHAR(100) 
  );
GO
SELECT  allocation_unit_id
INTO    #aus
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN tempdb.sys.system_internals_partitions p ON u.container_id = p.partition_id
WHERE   p.object_id = OBJECT_ID('tempdb..#temp');
GO
INSERT  INTO #temp
        SELECT TOP ( 328169 )
                'FillText'
        FROM    sys.system_internals_partition_columns a ,
                sys.system_internals_partition_columns;
GO
SELECT  'A' A , *
FROM    sys.dm_db_session_space_usage
WHERE   session_id = @@SPID;

SELECT  'A' A , u.* , p.rows
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN tempdb.sys.system_internals_partitions p ON u.container_id = p.partition_id
        JOIN #aus t ON u.allocation_unit_id = t.allocation_unit_id;
GO
DROP TABLE #temp;

SELECT  IDENTITY(INT,1,1) Id,SYSDATETIME()now, u.allocation_unit_id , data_pages , type_desc
INTO    #siau
FROM    tempdb.sys.system_internals_allocation_units u
        JOIN #aus t ON u.allocation_unit_id = t.allocation_unit_id ;

SELECT IDENTITY(INT,1,1) Id,SYSDATETIME()now, *
INTO #ssu
 FROM  sys.dm_db_session_space_usage 
WHERE user_objects_dealloc_page_count&gt;0 OR session_id = @@SPID
ORDER BY session_id;

SELECT IDENTITY(INT,1,1) Id,SYSDATETIME()now, tsu.*,r.status,r.command
INTO #tsu
 FROM  sys.dm_db_task_space_usage tsu
 JOIN sys.dm_exec_requests r
 ON tsu.session_id = r.session_id
WHERE user_objects_dealloc_page_count&gt;0
ORDER BY session_id;

DECLARE @AuId BIGINT = (SELECT TOP(1) allocation_unit_id FROM #aus);
DECLARE @Stop DATETIME2 = DATEADD(SECOND,10,SYSDATETIME());
WHILE (SYSDATETIME()&lt;@Stop)
BEGIN
  INSERT   INTO    #siau
  SELECT  SYSDATETIME()now, u.allocation_unit_id , data_pages , type_desc
  FROM    tempdb.sys.system_internals_allocation_units u
  WHERE u.allocation_unit_id = @AuId;

  INSERT INTO #ssu
  SELECT SYSDATETIME()now, *
   FROM  sys.dm_db_session_space_usage 
  WHERE user_objects_dealloc_page_count&gt;0 OR session_id = @@SPID
  ORDER BY session_id;

  INSERT INTO #tsu
  SELECT SYSDATETIME()now, tsu.*,r.status,r.command
   FROM  sys.dm_db_task_space_usage tsu
   JOIN sys.dm_exec_requests r
   ON tsu.session_id = r.session_id
   AND tsu.request_id = r.request_id
  WHERE user_objects_dealloc_page_count&gt;0
  ORDER BY session_id;
END

SELECT MIN(now) MinNow,MAX(now) MaxNow,type_desc,data_pages,allocation_unit_id
FROM #siau
GROUP BY type_desc,data_pages,allocation_unit_id
ORDER BY MIN(Id) ASC;

SELECT MIN(now) MinNow,MAX(now) MaxNow,session_id,database_id,user_objects_alloc_page_count,user_objects_dealloc_page_count,internal_objects_alloc_page_count,internal_objects_dealloc_page_count 
FROM #ssu 
GROUP BY session_id,database_id,user_objects_alloc_page_count,user_objects_dealloc_page_count,internal_objects_alloc_page_count,internal_objects_dealloc_page_count
ORDER BY session_id ASC, MIN(Id) ASC;

SELECT MIN(now) MinNow,MAX(now) MaxNow,status,command,session_id,request_id,exec_context_id,database_id,user_objects_alloc_page_count,user_objects_dealloc_page_count,internal_objects_alloc_page_count,internal_objects_dealloc_page_count
FROM #tsu 
GROUP BY status,command,session_id,request_id,exec_context_id,database_id,user_objects_alloc_page_count,user_objects_dealloc_page_count,internal_objects_alloc_page_count,internal_objects_dealloc_page_count
ORDER BY session_id ASC, MIN(Id) ASC;

GO
DROP TABLE #ssu;
DROP TABLE #tsu;
DROP TABLE #aus;
DROP TABLE #siau;
</pre>
</div>
<p>
This script again creates the same table #temp that was used by the previous script, fills it and drops it. After dropping the table it records for ten seconds information from the allocation_unit DMV and the two space_usage DMVs. After those ten seconds it produces this output:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/02/deferred_AU_drop_verbose.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/deferred_AU_drop_verbose.png" alt="deferred AU drop verbose Deferred Deallocations of Pages in TempDb " title="deferred_AU_drop_verbose" width="1075" height="585" class="alignleft size-full wp-image-682" /></a>
</p>
<p>
The first two result-sets show us the same information about the table that we have seen before. The other three result-sets contain grouped information of the three recorded DMVs. Each is grouped by all columns but the time of capture. That means, each row shows when those values where visible first and last, so you can see the progression of events.
</p>
<p>
The third result-set tells us that right after dropping the table the allocation_unit was marked as dropped. It at this point also shows a size of 0 pages. Just a little under a second later, at 17:18:06.1417576, the allocation_unit was seen last. At 17:18:06.1387573, just barely a millisecond after dropping the table, session 8 suddenly appears and starts to deallocate pages &ndash; pages it clearly did not allocate &ndash; until it, at 17:18:06.1417576, has deallocated all 1025 pages that belonged to our allocation_unit. That is exactly the same time when the allocation_unit itself was seen last. That this process started right after dropping the table is a coincidence and you might see several seconds pass before it starts, when you try this out yourself.
</p>
<p>
During the whole time of the recording from the table drop to ten seconds later, the information returned by sys.dm_db_session_space_usage DMV did not change at all. So, while the cleanup process can be seen to deallocate pages, that information never gets rolled up into the session space DMV. The session that originally created the table also never gets updated.
</p>
<h3>Versions</h3>
<p>
I have tested this in the following versions of SQL Server:
<pre>
Microsoft SQL Server 2005 - 9.00.5057.00 (X64)
Microsoft SQL Server 2008 (SP3) - 10.0.5500.0 (X64)
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)
Microsoft SQL Server 2012 RC0 - 11.0.1750.32 (X64)
</pre>
All SQL Server versions listed above show the same behavior.
</p>
<h3>Conclusion</h3>
<p>
I consider this behavior a bug in SQL Server. As the last example showed, the allocation_unit gets cleaned up by a background process. At that point the link to the original table and with it the originating session is gone, so the background process cannot update the session space usage information. However, right at the time of the drop, when the allocation_unit is marked as DROPPED, it also shows a size of 0 pages. At this point the task space usage information for the request executing the drop should have been updated to also reflect the drop. The pages could at the same time appear in the allocated column of one of the "TASK MANAGER" requests that execute the background cleanup tasks to make sure that the total of all allocated and deallocated pages in those two DMVs always matches the current state of tempdb.
</p>
<p>
I created a Connect item with Microsoft for this behavior here:<a href="https://connect.microsoft.com/SQLServer/feedback/details/725621/sys-dm-db-session-space-usage-inaccurate-after-deferred-drop-of-allocation-unit" title="https://connect.microsoft.com/SQLServer/feedback/details/725621/sys-dm-db-session-space-usage-inaccurate-after-deferred-drop-of-allocation-unit" target="_blank"></a>. If you agree that this should be addressed, please go there and vote for it.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/cb8IerYMilc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/671/deferred-deallocations-of-pages-in-tempdb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/671/deferred-deallocations-of-pages-in-tempdb/</feedburner:origLink></item>
		<item>
		<title>Purge Problems [TSQL Tuesday #027 - The Big Data Valentine’s Edition]</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/Wdx6fXb147g/</link>
		<comments>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 15:00:40 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=666</guid>
		<description><![CDATA[T-SQL Tuesday #27 is hosted by Steve Jones (blog&#124;twitter). This month’s topic is “Big Data”. Purge Problems in Big Data &#8211; Not only a SELECT needs an index Recently I was called to an issue at a 5+ TB customer. The purge job had stopped working a while back and the disk drives started to <a href="http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/#more-666'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<style type="text/css">
<!--
p {margin-top:8px;}
-->
</style>
<div>
<p><a href="https://voiceofthedba.wordpress.com/2012/02/07/t-sql-tuesday-027-the-big-data-valentines-edition/"><img height="132" border="0" hspace="9" width="131" alt="SqlTuesday Purge Problems [TSQL Tuesday #027   The Big Data Valentine’s Edition]" src="http://images.sqlity.net/SqlTuesday.png" title="Purge Problems [TSQL Tuesday #027   The Big Data Valentine’s Edition]" /></a></p>
<p>T-SQL Tuesday #27 is hosted by Steve Jones (<a href="https://voiceofthedba.wordpress.com/">blog</a>|<a href="https://twitter.com/#!/way0utwest">twitter</a>). This month’s topic is “<a href="https://voiceofthedba.wordpress.com/2012/02/07/t-sql-tuesday-027-the-big-data-valentines-edition/">Big Data</a>”.</p>
<h2>Purge Problems in Big Data &ndash; Not only a SELECT needs an index</h2>
<p>
Recently I was called to an issue at a 5+ TB customer. The purge job had stopped working a while back and the disk drives started to feel all bloated. 
</p>
<p>
The purge job ran every night to delete data older than n days. It would execute for over 13 hours and than quit reporting that there were not enough resources to complete the query.
</p>
<p>
The procedure would go through 5 tables and delete rows based on their relationship to "expired" records in a common parent table. After that, those parent records where supposed to get deleted. But the procedure never got to finish the third delete statement.
</p>
<p>
The problem was, that each delete would anew request the list of expired records form the parent table in a DELETE WHERE parent_id IN (SELECT id FROM parent); format. The inner select had a where clause checking a date column and an indicator column. While there was an index on the date column, SQL Server could not use it because of the age calculation performed. Also, the indicator was not part of the index. With this setup SQL Server had to perform a table scan of one of the bigger tables in that database for each of the child tables and than one more time for the parent table itself.
</p>
<p>
There are a few issues with this algorithm, including that by the time the parent table got to be purged more rows might qualify than at the time the first child was purged. That would cause foreign key violations during the delete &ndash; but the process never got that far anyway.
</p>
<p>
To resolve the issue I added the indicator to the index on the date column as an included column. I than rewrote the procedure to retrieved the ids of all to-be-purged records into a temp table, rewriting the select in a way that the index actually could be used. All the delete statements would than join to that temp table to delete the necessary rows. I also made sure that there was an index on the parent_id on each child table &ndash; a recommended best practice for all foreign key relationships anyway.
</p>
<p>
With those changes in place, the purge is now happily humming along again. What we can learn from it is the fact, that while indexes usually help SELECTs and hinder INSERTs and DELETEs, sometimes you need an index to be able to even execute your DELETE statements.
</p> <img src="http://feeds.feedburner.com/~r/sqlitynet/~4/Wdx6fXb147g" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/</feedburner:origLink></item>
		<item>
		<title>Purging a Table based on a Fixed Number of Rows to Keep</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/IDuPkfpO5HU/</link>
		<comments>http://sqlity.net/en/646/purging-a-table-based-on-a-fixed-number-of-rows-to-keep/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 15:00:01 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Archiving and Purging]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=646</guid>
		<description><![CDATA[Purging old data out of the database usually is driven by a date. If the business does not need records that are older than a Year, and if your records have a date column, you can purge by deleting all records from that table that have a date that is older ("&#60;") than DATEADD(year,-1,GETDATE()) Sometimes <a href="http://sqlity.net/en/646/purging-a-table-based-on-a-fixed-number-of-rows-to-keep/#more-646'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
Purging old data out of the database usually is driven by a date. If the business does not need records that are older than a Year, and if your records have a date column, you can purge by deleting all records from that table that have a date that is older ("&lt;") than DATEADD(year,-1,GETDATE())
</p>
<p>
Sometimes however the purging is driven by a fixed number of rows that need to be kept. In this post I am going to show how to do this type of purging most efficiently.
<p>
<p>
For this article I am assuming that the table that needs to be purged has a clustered index on an IDENTITY column. The final solution I am going to present does not rely on that fact. However if the order of rows for the purging process is not given by the clustered index, another method might be more efficient.
</p>
<p>
To start let us create a table and insert 10000 rows into it:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.ToBePurged') IS NOT NULL DROP TABLE dbo.ToBePurged;
CREATE TABLE dbo.ToBePurged(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, PlaceHolderForOtherColumns CHAR(400));
INSERT INTO dbo.ToBePurged(PlaceHolderForOtherColumns) SELECT TOP(10000) 'Fill' FROM sys.system_internals_partition_columns a,sys.system_internals_partition_columns b;
</pre>
</div>
<p>
One of the ways to go about this purge is to find the maximum IDENTITY value in the table and subtract the number of rows to keep from it. That will give you the largest IDENTITY value that needs to be deleted:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @NoRowsToKeep INT = 4864;
DELETE dbo.ToBePurged WHERE Id &lt;= (SELECT MAX(Id) - @NoRowsToKeep FROM dbo.ToBePurged);
</pre>
</div>
<p>
This looks neat, however there is a problem: If there is a gap in the IDENTITY values in the table you will end up deleting to many rows:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.ToBePurged') IS NOT NULL DROP TABLE dbo.ToBePurged;
CREATE TABLE dbo.ToBePurged(Id INT IDENTITY(13,5) PRIMARY KEY CLUSTERED, PlaceHolderForOtherColumns CHAR(400));
INSERT INTO dbo.ToBePurged(PlaceHolderForOtherColumns) SELECT TOP(10000) 'Fill' FROM sys.system_internals_partition_columns a,sys.system_internals_partition_columns b;
GO
DECLARE @NoRowsToKeep INT = 4864;
DELETE dbo.ToBePurged WHERE Id &lt;= (SELECT MAX(Id) - @NoRowsToKeep FROM dbo.ToBePurged);
SELECT COUNT(1) Cnt, @NoRowsToKeep NoRowsToLeave FROM dbo.ToBePurged;
</pre>
</div>
<p>
In this example I created the table with an IDENTITY step size greater than one. This might seem arbitrary, however keep in mind that the increment of the IDENTITY value for a table never gets rolled back. So if you ever had a failing insert you also have gaps.
</p>
<p>
The result of the last query in the above example shows that there are only 973 rows left in the table instead of the expected 4864. So we need to get a little more sophisticated:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @NoRowsToKeep INT = 4864;
SET STATISTICS IO ON;
DELETE dbo.ToBePurged WHERE Id NOT IN (SELECT TOP(@NoRowsToKeep) Id FROM dbo.ToBePurged ORDER BY Id DESC);
SET STATISTICS IO OFF;
SELECT COUNT(1) Cnt, @NoRowsToKeep NoRowsToLeave FROM dbo.ToBePurged;
</pre>
</div>
<p>
This, and all following examples assume that the table was recreated as in the previous example. That ensures that the numbers we are going to look at are comparable.
</p>
<p>
This solution reads a total of 21412 pages while using two Clustered Index Scan RelOps and a Sort:
</p>
<div>
<pre>
Table 'ToBePurged'. Scan count 2, logical reads 11060, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 10352, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
</pre>
</div>
<div><div id="attachment_657" class="wp-caption alignnone" style="width: 705px"><a href="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_1.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_1-1024x140.png" alt="purge by row count execution plan 1 1024x140 Purging a Table based on a Fixed Number of Rows to Keep" title="purge_by_row_count_execution_plan_1" width="695" height="95" class="size-large wp-image-657" /></a><p class="wp-caption-text">Execution Plan 1 - click to enlarge</p></div></div>
<p>
We can replace the second Scan RelOp with a Seek RelOp:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @NoRowsToKeep INT = 4864;
SET STATISTICS IO ON;
DELETE dbo.ToBePurged WHERE Id &lt; (SELECT MIN(Id) FROM (SELECT TOP(@NoRowsToKeep) Id FROM dbo.ToBePurged ORDER BY Id DESC)X);
SET STATISTICS IO OFF;
SELECT COUNT(1) Cnt, @NoRowsToKeep NoRowsToLeave FROM dbo.ToBePurged;
</pre>
</div>
<p>
This solution reads a total of 211156 pages while using a Clustered Index Scan and a Clustered Index Seek RelOp without a Sort:
</p>
<div>
<pre>
Table 'ToBePurged'. Scan count 2, logical reads 10804, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 10352, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
</pre>
</div>
<div><div id="attachment_656" class="wp-caption alignnone" style="width: 705px"><a href="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_2.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_2-1024x143.png" alt="purge by row count execution plan 2 1024x143 Purging a Table based on a Fixed Number of Rows to Keep" title="purge_by_row_count_execution_plan_2" width="695" height="97" class="size-large wp-image-656" /></a><p class="wp-caption-text">Execution Plan 2 - click to enlarge</p></div></div>
<p>
If you look closely, the seek is still a scan, it just starts in the middle of the table instead of at the beginning. To find the starting point, SQL Server executes a seek operation, hence the Clustered Index Seek RelOp in the plan.
</p>
<p>
The biggest issue with both solutions is however the need of a spool. SQL Server uses an Eager Spool RelOp when it needs to delete rows from a table based on the results of a complex (enough) query. So how can we simplify this query?
</p>
<p>
When SQL Server executes the previous query, it actually does one complete table scan: The first Scan RelOp that is followed by a top reads through the first 4864 rows to determine the MIN(Id) value. The Seek RelOp then finds the first row that has an Id value smaller than that MIN value and continues to read through all remaining rows from there. The work performed is therefore one complete scan through the table.
</p>
<p>
It should reduce the complexity of this query significantly, if we could do both steps with one Clustered Index Scan RelOp. We basically want SQL Server to read through all the rows in one ordered swoop and start deleting once we are past the rows we need to keep. We can achieve that with the ROW_NUMBER() function:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @NoRowsToKeep INT = 4864;
SET STATISTICS IO ON;
DELETE tbp
  FROM (SELECT ROW_NUMBER() OVER(ORDER BY Id DESC) rn FROM dbo.ToBePurged) tbp
 WHERE tbp.rn &gt; @NoRowsToKeep;
SET STATISTICS IO OFF;
SELECT COUNT(1) Cnt, @NoRowsToKeep NoRowsToLeave FROM dbo.ToBePurged;
</pre>
</div>
<p>
This solution now reads only a total of 10801 pages using only one Clustered Index Scan RelOp:
</p>
<div>
<pre>
Table 'ToBePurged'. Scan count 1, logical reads 10801, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
</pre>
</div>
<div><div id="attachment_660" class="wp-caption alignnone" style="width: 705px"><a href="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_3.png"><img src="http://sqlity.net/en/wp-content/uploads/2012/02/purge_by_row_count_execution_plan_3-1024x88.png" alt="purge by row count execution plan 3 1024x88 Purging a Table based on a Fixed Number of Rows to Keep" title="purge_by_row_count_execution_plan_3" width="695" height="59" class="size-large wp-image-660" /></a><p class="wp-caption-text">Execution Plan 3 - click to enlarge</p></div></div>
<p>
This query is making use of the fact, that SQL Server knows to delete the rows in the underlying table even so the FROM clause is referencing a sub-select. Because the sub-select does not reference any columns of the table, you do not need to worry about name collisions either. The above query works even, if the table contains a column with the name 'rn'.
</p>


</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/IDuPkfpO5HU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/646/purging-a-table-based-on-a-fixed-number-of-rows-to-keep/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/646/purging-a-table-based-on-a-fixed-number-of-rows-to-keep/</feedburner:origLink></item>
		<item>
		<title>Selecting the entire Database as XML String – 2</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/mpDbqxzeaD4/</link>
		<comments>http://sqlity.net/en/626/selecting-the-entire-database-as-xml-string-2/#comments</comments>
		<pubDate>Wed, 01 Feb 2012 17:11:49 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=626</guid>
		<description><![CDATA[About two weeks ago I wrote about how to get the content of the entire database into an XML document. Today Daniel commented, that he would like the XML to be in a compacter format. In this post I am going to explain how to get there. While WordPress&#8482; conveniently ate the example XML Daniel <a href="http://sqlity.net/en/626/selecting-the-entire-database-as-xml-string-2/#more-626'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
About two weeks ago I wrote about <a href="http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/">how to get the content of the entire database into an XML document</a>. Today <a href="http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/#comment-18" title="Daniel's comment">Daniel commented</a>, that he would like the XML to be in a compacter format. In this post I am going to explain how to get there.
</p>
<p>
While WordPress&trade; conveniently ate the example XML Daniel tried to post, I will assume for this article that he was going for this format:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;tables&gt;
  &lt;table name=&quot;[dbo].[T1366]&quot;&gt;
    &lt;row id=&quot;1&quot; c1=&quot;1366&quot; c2=&quot;2732&quot; c3=&quot;4098&quot; c4=&quot;5464&quot; c5=&quot;6830&quot; c6=&quot;8196&quot; /&gt;
    &lt;row id=&quot;2&quot; c1=&quot;1366&quot; c2=&quot;2732&quot; c3=&quot;4098&quot; c4=&quot;5464&quot; c5=&quot;6830&quot; c6=&quot;8196&quot; /&gt;
  &lt;/table&gt;
  &lt;table name=&quot;[dbo].[T127]&quot;&gt;
    &lt;row id=&quot;1&quot; c1=&quot;127&quot; c2=&quot;254&quot; c3=&quot;381&quot; c4=&quot;508&quot; c5=&quot;635&quot; c6=&quot;762&quot; c7=&quot;889&quot; /&gt;
    &lt;row id=&quot;2&quot; c1=&quot;127&quot; c2=&quot;254&quot; c3=&quot;381&quot; c4=&quot;508&quot; c5=&quot;635&quot; c6=&quot;762&quot; c7=&quot;889&quot; /&gt;
  &lt;/table&gt;
&lt;/tables&gt;
</pre>
</div>
<p>
Instead of all column values being sub-nodes of their &lt;row&gt; nodes they are now attributes of an empty &lt;row&gt; node. Also the &lt;data&gt; node between &lt;table&gt; and &lt;row&gt; is now missing, as it was kind of superfluous. And while I was changing the code anyway, I went ahead and added the previously missing root node as &lt;tables&gt;.
</p>
<p>
So how did I get there?
</p>
<p>
First let us look at the missing root node, which turns out to be the simplest of all the changes. When you select rows for a table specifying the FOR XML PATH('NodeName') directive, every row is going to be represented as a node with the passed in name. Those nodes are just hanging together without an enclosing root node. To make the output a valid XML document you can just specify the ROOT clause next to the PATH clause like this:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
  SELECT * 
    FROM dbo.Table
  FOR XML PATH('rowNodeName'),ROOT('rootNodeName'),TYPE
</pre>
</div>
<p>
The TYPE directive tells SQL Server to return the XML as a value with the XML type. If you leave it out the XML gets returned as a string, which can create problems if you want to continue to work with it in T-SQL.
</p>
<p>
The next step was, to get rid of the &lt;data&gt; node. When you specify an XML value in a select statement that is in turn using FOR XML PATH, that value is wrapped in a node with the value's column name. The generated SQL in my original solution looked like this:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT * FROM(
SELECT '[dbo].[t1]' [@name],(SELECT * FROM dbo.t1 FOR XML PATH('row'),TYPE) data
UNION ALL
SELECT '[dbo].[t2]' [@name],(SELECT * FROM dbo.t2 FOR XML PATH('row'),TYPE) data
)X FOR XML PATH('table'),TYPE;
</pre>
</div>
<p>
The data column in the UNION ALL query contains XML values, so the outer SELECT * ... FOR XML PATH('table') is going to wrap each one into a &lt;data&gt; node like this:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;table name=&quot;[dbo].[t1]&quot;&gt;
  &lt;data&gt;
    &lt;row&gt;
      &lt;id&gt;1&lt;/id&gt;
      &lt;c1&gt;1&lt;/c1&gt;
    &lt;/row&gt;
    &lt;row&gt;
      &lt;id&gt;2&lt;/id&gt;
      &lt;c1&gt;1&lt;/c1&gt;
    &lt;/row&gt;
  &lt;/data&gt;
&lt;/table&gt;
&lt;table name=&quot;[dbo].[t2]&quot;&gt;
  &lt;data&gt;
    &lt;row&gt;
      &lt;id&gt;1&lt;/id&gt;
      &lt;c1&gt;2&lt;/c1&gt;
      &lt;c2&gt;4&lt;/c2&gt;
    &lt;/row&gt;
    &lt;row&gt;
      &lt;id&gt;2&lt;/id&gt;
      &lt;c1&gt;2&lt;/c1&gt;
      &lt;c2&gt;4&lt;/c2&gt;
    &lt;/row&gt;
  &lt;/data&gt;
&lt;/table&gt;
</pre>
</div>
<p>
To remove the additional node we can use the fact that SQL Sever will inline the value of a column that does not have a name. So we need to select the data column without specifying its name. To do that we will use a trick:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT [@name],(SELECT data) FROM(
SELECT '[dbo].[t1]' [@name],(SELECT * FROM dbo.t1 FOR XML PATH('row'),TYPE) data
UNION ALL
SELECT '[dbo].[t2]' [@name],(SELECT * FROM dbo.t2 FOR XML PATH('row'),TYPE) data
)X FOR XML PATH('table'),TYPE;
</pre>
</div>
<p>
By wrapping the data column in its own SELECT it loses its name property for the outer SELECT statement. The output it generates now looks like this:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
&lt;table name=&quot;[dbo].[t1]&quot;&gt;
  &lt;row&gt;
    &lt;id&gt;1&lt;/id&gt;
    &lt;c1&gt;1&lt;/c1&gt;
  &lt;/row&gt;
  &lt;row&gt;
    &lt;id&gt;2&lt;/id&gt;
    &lt;c1&gt;1&lt;/c1&gt;
  &lt;/row&gt;
&lt;/table&gt;
&lt;table name=&quot;[dbo].[t2]&quot;&gt;
  &lt;row&gt;
    &lt;id&gt;1&lt;/id&gt;
    &lt;c1&gt;2&lt;/c1&gt;
    &lt;c2&gt;4&lt;/c2&gt;
  &lt;/row&gt;
  &lt;row&gt;
    &lt;id&gt;2&lt;/id&gt;
    &lt;c1&gt;2&lt;/c1&gt;
    &lt;c2&gt;4&lt;/c2&gt;
  &lt;/row&gt;
&lt;/table&gt;
</pre>
</div>
<p>
 The last thing we need to accomplish is to make the column values attributes of each &lt;row&gt; node instead of sub-nodes. If you specify a column name with a leading "@" sign, FOR XML PATH will make that value an attribute with that name (without the "@"):
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT id AS [@id] FROM dbo.t1 FOR XML PATH('row'),TYPE;
</pre>
<div>
<p>
The output of above statement looks like this:
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;row id=&quot;1&quot; /&gt;
&lt;row id=&quot;2&quot; /&gt;
</pre>
</div>
<p>
So instead of a SELECT * FROM for each table, we need to list all columns in the format shown above. To get this specially formatted column list we are going to use <a href="http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/">XML concatenation</a> one more time:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT STUFF((SELECT ','+QUOTENAME(name)+' AS '+QUOTENAME('@'+name) 
                              FROM sys.columns c 
                             WHERE c.object_id = OBJECT_ID('dbo.t2')
                             ORDER BY column_id 
                            FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'');
</pre>
</div>
<p>
This produces the desired list:
<pre>[id] AS [@id],[c1] AS [@c1],[c2] AS [@c2]</pre>
</p>
<p>
With that we have all the parts collected to create the new "compact" solution. The last step is to put it all together:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @cmd NVARCHAR(MAX) ;
SET @cmd = 'SELECT [@name],(SELECT data) FROM('
    + STUFF(
(SELECT ' UNION ALL SELECT ''' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
        + '.' + QUOTENAME(t.name) + ''' [@name],' + '(SELECT '+c.ColList+' FROM '
        + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + '.' + QUOTENAME(t.name)
        + ' FOR XML PATH(''row''),TYPE) data'
 FROM   sys.tables t
 CROSS APPLY (SELECT STUFF((SELECT ','+QUOTENAME(name)+' AS '+QUOTENAME('@'+name) 
                              FROM sys.columns c 
                             WHERE c.object_id = t.object_id
                             ORDER BY column_id 
                            FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
             )c(ColList)
    FOR     XML PATH('') ,
                TYPE).value('.', 'NVARCHAR(MAX)'), 1, 11, '')
    + ')X FOR XML PATH(''table''),ROOT(''tables''),TYPE;' ;

EXEC(@cmd) ;
</pre>
</div>
<p>
This produces the desired output that was shown all the way at the beginning of this article. The size of the resulting XML document for my example database went from about 1.4 MB for the original solution to now about 0.5 MB &ndash; a significant improvement.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/mpDbqxzeaD4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/626/selecting-the-entire-database-as-xml-string-2/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/626/selecting-the-entire-database-as-xml-string-2/</feedburner:origLink></item>
		<item>
		<title>The Performance Impact of Forwarded Records on Table Scans</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/m7p0XluTc3M/</link>
		<comments>http://sqlity.net/en/609/the-performance-impact-of-forwarded-records-on-table-scans/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 16:00:18 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server Internals]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=609</guid>
		<description><![CDATA[Forwarded records are a mechanism that SQL Server employs to reduce the amount of maintenance work for indexes on a table without a clustered index. During an update of a record in a table that does not have a clustered index, an increase in size might prevent the record to fit on its current page. <a href="http://sqlity.net/en/609/the-performance-impact-of-forwarded-records-on-table-scans/#more-609'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<p>
Forwarded records are a mechanism that SQL Server employs to reduce the amount of maintenance work for indexes on a table without a clustered index. During an update of a record in a table that does not have a clustered index, an increase in size might prevent the record to fit on its current page. If that happens, a forwarding pointer is created in the current page and the record is moved to a new location.
</p>
<p>
It is well known that such forwarded records will cause each RID-Lookup operation that encounters one to have to read a second page. If you are doing a filtered index scan followed by an RID-Lookup this can potentially double the amount of work necessary to retrieve all rows.
</p>
<p>
What is less well known is the fact, that such forwarded records can significantly impact the performance of a simple table scan as well. In this article I will show you why this happens.
</p>
<h3>Terminology</h3>
<p>
To explain the way SQL Server handles forwarded records during a table scan, I am first going to rehash some storage basics.
</p>
<p>
SQL Server knows two ways to organize the data in a table on disk. One is the Clustered Index and the other one the Heap. In a Clustered Index data is organized in a B+Tree structure allowing for fast access based on the Clustered Index Key. In a Heap on the other hand the data is not sorted or linked in any way. Any new row is inserted into the table in any free space big enough to hold the row.
</p>
<p>
In both cases the smallest unit of storage and access is a Page. Every piece of data SQL Server needs to store on disk is organized in blocks called pages. Each page is exactly 8192 Bytes in size. The only exception to this rule are the log files. Log data is stored in a structure called virtual log file and those are not part of today's discussion.
</p>
<p>
Indexes created on a Heap use a physical address to point from the index record to the underlying data record. This address consists of the file number, the number of the page that contains the record within that file and the slot number inside that page.
</p>
<h3>Forwarded Records</h3>
<p>
When a record with variable length data type columns is updated, its size might increase. If the page it is stored in is filled already, the updated record might need to be moved to a new page. If that happens, all pointers in any index on that table addressing the moved row now point to the wrong (old) place. There are two ways to handle this situation: Update all indexes with the new location or put a special record in the old place that "forwards" to the new location.
</p>
<p>
Going through all existing indexes can be extensively resource intensive. For that reason SQL Server is using the second option to put a forwarding record in the old place.
</p>
<p>
Such a forwarding record links to the new physical address of the actual data. The new record also contains a pointer back to the forwarding record. This back-pointer allows the forwarding record to get updated in case the data record has to move again or in case it gets deleted which causes the forwarding record to be deleted as well. Updating the forwarding records in cases of repeated data record moves prevents long forwarding chains. Instead there is always only at max on hop.
</p>
<p>
In the case of an RID-Lookup SQL Server has to follow those forwarding records to get to the actual data record. This causes an additional (logical) page read for each forwarding record encountered.
</p>
<p>
During a table scan on the other hand each page will be read anyway so SQL Server could just ignore all forwarding records. That however could cause reading inconsistencies. Rows that are updated after the scan started and before it finishes could be missed entirely by the scan, if the record gets moved to a page that was scanned already. For that reason SQL Server follows each forwarding record immediately. This not only causes an additional read right there, is also turns the sequential read of all pages in allocation order into a succession of random reads, hurting performance potentially even more.
</p>
<h3>Demonstration</h3>
<p>
To demonstrate this behavior let us set up two tables dbo.ForwardSmall and dbo.NoForwardSmall as well as a Procedure dbo.ChangeRow to help with the creation of forwarded records:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.ForwardSmall') IS NOT NULL DROP TABLE dbo.ForwardSmall;
CREATE TABLE dbo.ForwardSmall(Id INT ,F VARCHAR(8000))
GO
IF OBJECT_ID('dbo.NoForwardSmall') IS NOT NULL DROP TABLE dbo.NoForwardSmall;
CREATE TABLE dbo.NoForwardSmall(Id INT ,F VARCHAR(8000))
GO

IF OBJECT_ID('dbo.ChangeRow') IS NOT NULL DROP PROCEDURE dbo.ChangeRow;
GO
CREATE PROCEDURE dbo.ChangeRow
@Id INT,
@Size INT
AS
BEGIN
  SET NOCOUNT ON;
  MERGE dbo.ForwardSmall fs
  USING (SELECT @Id,REPLICATE('X',@Size)) x(Id,F)
  ON fs.Id = x.Id
  WHEN MATCHED THEN
    UPDATE SET fs.F = x.F
  WHEN NOT MATCHED THEN
    INSERT (Id,F)VALUES(x.Id,x.F);
END
GO
</pre>
</div>
<p>
The procedure will take an Id parameter as well as a size parameter. It uses to merge command to either insert or update the record with the given Id value. The F column will be valued with a string of length @Size.
</p>
<p>
The following script will create 16 rows in the table dbo.ForwardSmall and then update those records to create forwarded records.
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
EXEC dbo.ChangeRow 11,900;
EXEC dbo.ChangeRow 12,900;
EXEC dbo.ChangeRow 13,900;
EXEC dbo.ChangeRow 14,900;
EXEC dbo.ChangeRow 15,900;
EXEC dbo.ChangeRow 16,900;
EXEC dbo.ChangeRow 17,900;
EXEC dbo.ChangeRow 18,900;

EXEC dbo.ChangeRow 21,900;
EXEC dbo.ChangeRow 22,900;
EXEC dbo.ChangeRow 23,900;
EXEC dbo.ChangeRow 24,900;
EXEC dbo.ChangeRow 25,900;
EXEC dbo.ChangeRow 26,900;
EXEC dbo.ChangeRow 27,900;
EXEC dbo.ChangeRow 28,900;
--------------------------
EXEC dbo.ChangeRow 27,0;
EXEC dbo.ChangeRow 28,0;
EXEC dbo.ChangeRow 18,1800;
EXEC dbo.ChangeRow 28,900;
EXEC dbo.ChangeRow 18,900;

EXEC dbo.ChangeRow 26,0;
EXEC dbo.ChangeRow 17,1800;
EXEC dbo.ChangeRow 27,900;
EXEC dbo.ChangeRow 17,900;

EXEC dbo.ChangeRow 25,0;
EXEC dbo.ChangeRow 16,1800;
EXEC dbo.ChangeRow 26,900;
EXEC dbo.ChangeRow 16,900;

EXEC dbo.ChangeRow 24,0;
EXEC dbo.ChangeRow 15,1800;
EXEC dbo.ChangeRow 25,900;
EXEC dbo.ChangeRow 15,900;


EXEC dbo.ChangeRow 23,0;
EXEC dbo.ChangeRow 14,1800;
EXEC dbo.ChangeRow 24,900;
EXEC dbo.ChangeRow 14,900;

EXEC dbo.ChangeRow 22,0;
EXEC dbo.ChangeRow 13,1800;
EXEC dbo.ChangeRow 23,900;
EXEC dbo.ChangeRow 13,900;

EXEC dbo.ChangeRow 21,0;
EXEC dbo.ChangeRow 12,1800;
EXEC dbo.ChangeRow 22,1499;
EXEC dbo.ChangeRow 12,780;

EXEC dbo.ChangeRow 11,910;

EXEC dbo.ChangeRow 22,900;
EXEC dbo.ChangeRow 21,910;
</pre>
</div>
<p>
Each record starts out with ~950 bytes, so the 16 records will fit into two pages. From there the records are updated in a way so that each of them ends up on the other page with a forwarding record in its old place.
</p>
<p>
Now lets put the exact same data into the dbo.NoForwardSmall table:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
INSERT INTO dbo.NoForwardSmall
SELECT * FROM dbo.ForwardSmall;
</pre>
</div>
<p>
This again requires two pages, but this time no forwarded records are created.
</p>
<p>
To confirm, we can use the following query using the sys.dm_db_index_physical_stats DMF:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT OBJECT_SCHEMA_NAME(object_id)+'.'+OBJECT_NAME(object_id) Tbl,index_type_desc,alloc_unit_type_desc,page_count,record_count,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,'detailed') ps
WHERE object_id IN (OBJECT_ID('dbo.ForwardSmall'),OBJECT_ID('dbo.NoForwardSmall'));
</pre>
</div>
<p>
This query returns the number of pages, the total number of records and the number of forwarded records for each table:
</p>
<div>
<style type="text/css">
table.tableizer-table {border: 1px solid #CCC; font-family: Arial, Helvetica, sans-serif; font-size: 12px;} .tableizer-table td {padding: 4px; margin: 3px; border: 1px solid #ccc;}
.tableizer-table th {background-color: #104E8B; color: #FFF; font-weight: bold;}
</style>
<table class="tableizer-table">
<tr class="tableizer-firstrow"><th>Tbl</th><th>index_type_desc</th><th>alloc_unit_type_desc</th><th>page_count</th><th>record_count</th><th>forwarded_record_count</th></tr> <tr><td>dbo.ForwardSmall</td><td>HEAP</td><td>IN_ROW_DATA</td><td>2</td><td>32</td><td>16</td></tr> <tr><td>dbo.NoForwardSmall</td><td>HEAP</td><td>IN_ROW_DATA</td><td>2</td><td>16</td><td>0</td></tr></table>
</div>
<p>
Now let us look at what happens during a table scan:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SET STATISTICS IO ON;
SELECT * FROM dbo.NoForwardSmall;
GO
SELECT * FROM dbo.ForwardSmall;
SET STATISTICS IO OFF;
</pre>
</div>
<p>
This produces the following output:
</p>
<div>
<pre>
(16 row(s) affected)
Table 'NoForwardSmall'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(16 row(s) affected)
Table 'ForwardSmall'. Scan count 1, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
</pre>
</div>
<p>
Scanning the dbo.NoForwardSmall table caused 2 logical reads - one for each page of the table. Scanning the dbo.ForwardSmall table however caused 18 logical reads. That is almost ten times more. Each of the 16 records caused an additional page read.
</p>
<h3>Conclusion</h3>
<p>
This article showed that SQL Server follows each forwarding record it encounters right away. This not only happens in the case of an RID-Lookup but also in the case of a table scan. These additional reads can present a significant performance impact. If you have a decently sized table with variable length columns and with regular update activity, add this behavior to the list of reasons why the table really should have a clustered index. 
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/m7p0XluTc3M" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/609/the-performance-impact-of-forwarded-records-on-table-scans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/609/the-performance-impact-of-forwarded-records-on-table-scans/</feedburner:origLink></item>
		<item>
		<title>2012-02-08: Index Internals – February 2012 Philadelphia SQL Server Users Group Meeting</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/pHUF7pP8ZEg/</link>
		<comments>http://sqlity.net/en/605/2012-02-08-index-internals-february-2012-philadelphia-sql-server-users-group-meeting/#comments</comments>
		<pubDate>Sun, 29 Jan 2012 01:37:19 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=605</guid>
		<description><![CDATA[On 2/8 I am going repeat my talk about Index Internals at the Philadelphia SQL Server Users Group Meeting. Signup here to attend: http://pssug20120208.eventbrite.com/ TitleIndex Internals - How SQL Server stores and accesses data When02/08/2012 at 17:30 Where Philadelphia SQL Server Users Group Meeting DeVry University Rm 121 1140 Virginia Dr Fort Washington, PA]]></description>
			<content:encoded><![CDATA[<p>
On 2/8 I am going repeat my talk about Index Internals at the <a href="http://www.pssug.org/">Philadelphia SQL Server Users Group</a> Meeting. <br />Signup here to attend: <a href="http://pssug20120208.eventbrite.com/">http://pssug20120208.eventbrite.com/</a>
</p>
<style type="text/css"><!-- .WhenAndWhere td {padding:5px; border:1px solid #000;border-radius:3px;} --></style>
<table  class="WhenAndWhere">
<tr>
<td>Title</td><td><a href="http://sqlity.net/en/592/presentation-materials-index-internals/">Index Internals - How SQL Server stores and accesses data</a></td>
</tr>
<tr>
<td>When</td><td>02/08/2012 at 17:30</td>
</tr><tr>
<td>Where</td><td>
<div>
<a href="http://pssug20120208.eventbrite.com/">Philadelphia SQL Server Users Group Meeting</a>
<br />DeVry University
<br />Rm 121
<br />1140 Virginia Dr
<br />Fort Washington, PA
</div>
</td>
</tr>
</table><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/pHUF7pP8ZEg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/605/2012-02-08-index-internals-february-2012-philadelphia-sql-server-users-group-meeting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/605/2012-02-08-index-internals-february-2012-philadelphia-sql-server-users-group-meeting/</feedburner:origLink></item>
		<item>
		<title>How to rollback in procedures</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/b9olR8YHJBY/</link>
		<comments>http://sqlity.net/en/585/how-to-rollback-in-procedures/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 15:00:39 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Best Practice]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Transactions]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=585</guid>
		<description><![CDATA[Transaction Management The question where in the call stack a transaction should be managed, is similar to the one where errors should be handled. There are several strong reasons to push the transaction management as high up the stack as possible. The most obvious reason is the lack of support for nested transactions in SQL <a href="http://sqlity.net/en/585/how-to-rollback-in-procedures/#more-585'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div>
<h3>Transaction Management</h3>
<p>
The question where in the call stack a transaction should be managed, is similar to the one where errors should be handled. There are several strong reasons to push the transaction management as high up the stack as possible. The most obvious reason is the lack of support for nested transactions in SQL Server.
</p>
<p>
SQL Server does allow to nest BEGIN TRANSACTION ... COMMIT blocks within each other. However, only the outermost layer actually opens and closes a transaction. In the inner layers SQL Server just maintains a counter; only the outermost COMMIT has any effect on the data.
</p>
<p>
With that structure in place, SQL Server has no way to do a partial rollback, so if a rollback is requested, everything is rolled back to the beginning of the outer most transaction.
</p>
<p>
If you have code, that does some data changes followed by a procedure call followed by other data changes, all within a transaction, and the procedure decides to execute a rollback, the rest of the data changes after the procedure call will execute outside of any transaction. Among other catastrophies this can cause inconsistencies in your data and it will prevent another rollback later.
</p>
<p>
SQL Server realizes this to be a mayor issue. It raises an error anytime the transaction count before and after the execution of a procedure do not match. You can try this out with this code snippet:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.DoRollback') IS NOT NULL DROP PROCEDURE dbo.DoRollback;
GO
CREATE PROCEDURE dbo.DoRollback
AS
BEGIN
  BEGIN TRAN
  ROLLBACK;
  RETURN 0;
END;
GO

RAISERROR('Calling dbo.DoRollback outside of transaction',0,1)WITH NOWAIT;
EXEC dbo.DoRollback; -- No Error

GO

RAISERROR('Calling dbo.DoRollback inside of transaction',0,1)WITH NOWAIT;
BEGIN TRAN;
EXEC dbo.DoRollback; -- Error
</pre>
</div>
<p>
Executing the procedure after starting a transaction causes an error like this:
</p>
<pre>Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.</pre>
<p>
That error however does not prevent execution of additional code. The following example demonstrates that by calling the above procedure within another procedure:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.CallDoRollback') IS NOT NULL DROP PROCEDURE dbo.CallDoRollback;
GO
CREATE PROCEDURE dbo.CallDoRollback
AS
BEGIN
  BEGIN TRAN;
  EXEC dbo.DoRollback;
  RAISERROR('--Inside dbo.CallDoRollback after calling dbo.DoRollback',0,1)WITH NOWAIT;
  IF(@@TRANCOUNT&gt;0)ROLLBACK;
  RETURN 0;
END;
GO

RAISERROR('Calling dbo.CallDoRollback',0,1)WITH NOWAIT;
EXEC dbo.CallDoRollback;
</pre>
</div>
<p>
So the conclusion here is, that a stored procedure should never execute a rollback. Instead it should raise an error to signal to the outer layer that a situation occurred that requires some mediation like a rollback.
</p>
<h3>Partial Rollbacks</h3>
<p>
Every once in while you need to code a procedure that needs to undo its own actions independent of the circumstances it was called in. That can be achieved with a partial rollback.
</p>
<p>
While SQL Server does not support nested transactions, it allows you to set a save-point within an open transaction. After setting a safe-point you can then rollback all changes that occurred after it. The DoCorrectRollback procedure demonstrates how you can use that functionality in your procedures.
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.DoCorrectRollback') IS NOT NULL DROP PROCEDURE dbo.DoCorrectRollback;
GO
CREATE PROCEDURE dbo.DoCorrectRollback
AS
BEGIN
  DECLARE @ErrorStatus INT; SET @ErrorStatus = 0;
  DECLARE @TranName VARCHAR(32);SET @TranName = REPLACE((CAST(NEWID() AS VARCHAR(36))),'-','');

  BEGIN TRAN;
  RAISERROR('Setting savepoint: %s',0,1,@TranName)WITH NOWAIT;
  SAVE TRAN @TranName;
 
  BEGIN TRY
    EXEC dbo.DoError;
  END TRY
  BEGIN CATCH
    DECLARE @msg NVARCHAR(MAX);SET @msg = ERROR_MESSAGE();
    RAISERROR('Caught this error: %s',0,1,@msg)WITH NOWAIT;
    SET @ErrorStatus = -1
  END CATCH;

  IF(@ErrorStatus != 0)
  BEGIN
    RAISERROR('Rolling back to savepoint: %s',0,1,@TranName)WITH NOWAIT;
    ROLLBACK TRANSACTION @TranName;
  END;
  
  COMMIT;
  
  RETURN 0;
END;
GO

RAISERROR('Calling dbo.DoCorrectRollback outside of transaction',0,1)WITH NOWAIT;
EXEC dbo.DoCorrectRollback;

GO

RAISERROR('Calling dbo.DoCorrectRollback inside of transaction',0,1)WITH NOWAIT;
BEGIN TRAN
  EXEC dbo.DoCorrectRollback;
COMMIT;
GO
</pre>
</div>
<p>
The procedure first executes a BEGIN TRANSACTION to make sure that there is an active transaction. (Remember, this does not have any effect inside a preexisting transaction.) Next, the procedure sets a save-point. A save-point requires a unique name that can be up to 32 characters long. The procedure uses the NEWID() function to get this name. NEWID() returns 36 characters of which 4 are a '-', so by removing them we end up with the required 32 characters. 
</p>
<p>
After this transaction setup section the actual code is executed inside a TRY...CATCH block. If an error is thrown, the error handling code causes a ROLLBACK to be executed. The ROLLBACK TRANSACTION statement takes the save-point name as parameter, which causes the rollback to undo only actions that happened after the save-point was set - the beginning of the procedure in our case.
</p>
<p>
This ROLLBACK to the previously set save-point does not change the open transaction counter at all, so before the procedure returns it needs to execute a commit to bring the transaction counter back to the state it was in at the beginning of the procedure. This step has to be taken independently of the ROLLBACK being executed or not. It should therefore be the last statement of the procedure.
</p>
<p>
This allows the procedure to be called inside and outside of a transaction without causing adverse effects by executing a rollback.
</p>
<h3>Limitations</h3>
<p>
SQL Server allows to set several save-points within one active transaction as long as they have unique names. If a name is reused the existing save-point is moved to the current position and no error is thrown, so be sure to use unique names as shown in the example.
</p>
<p>
Another limitation is, that save-points cannot be used in distributed transactions spanning more than one SQL Server instance. If you try to set a save-point in a distributed transaction an error is thrown.
</p>
<p>
Lastly, the example given does not notify the outer layers about the existence of a problem. This should certainly be part of the design of your procedures. There are several ways to achieve this, for example by raising an error after the rollback step that contains the reason for the rollback.
</p>
</div><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/b9olR8YHJBY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/585/how-to-rollback-in-procedures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/585/how-to-rollback-in-procedures/</feedburner:origLink></item>
		<item>
		<title>2012-01-24: Index Internals – January 2012 SJ SQL Server Meeting</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/tqlsNpIlj50/</link>
		<comments>http://sqlity.net/en/594/2012-01-24-index-internals-january-2012-sj-sql-server-meeting/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 21:01:59 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Events]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=594</guid>
		<description><![CDATA[On 1/24 I am going to talk about Index Internals at the SJ SQL Server Meeting. Signup here to attend: http://sjpssug20120124.eventbrite.com/ TitleIndex Internals - How SQL Server stores and accesses data When01/24/2012 at 17:30 Where SJ SQL Server Meeting Haddon Heights Public Library 608 Station Ave Haddon Heights, NJ 08035]]></description>
			<content:encoded><![CDATA[<p>
On 1/24 I am going to talk about Index Internals at the SJ SQL Server Meeting. <br />Signup here to attend: <a href="http://sjpssug20120124.eventbrite.com/">http://sjpssug20120124.eventbrite.com/</a>
</p>
<style type="text/css"><!-- .WhenAndWhere td {padding:5px; border:1px solid #000;border-radius:3px;} --></style>
<table  class="WhenAndWhere">
<tr>
<td>Title</td><td><a href="http://sqlity.net/en/592/presentation-materials-index-internals/">Index Internals - How SQL Server stores and accesses data</a></td>
</tr>
<tr>
<td>When</td><td>01/24/2012 at 17:30</td>
</tr><tr>
<td>Where</td><td>
<div>
<a href="http://sjpssug20120124.eventbrite.com/">SJ SQL Server Meeting</a>
<br />Haddon Heights Public Library
<br />608 Station Ave
<br />Haddon Heights, NJ 08035
</div>
</td>
</tr>
</table><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/tqlsNpIlj50" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/594/2012-01-24-index-internals-january-2012-sj-sql-server-meeting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/594/2012-01-24-index-internals-january-2012-sj-sql-server-meeting/</feedburner:origLink></item>
		<item>
		<title>Presentation Materials – Index Internals</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/VFAT6Xe7GFM/</link>
		<comments>http://sqlity.net/en/592/presentation-materials-index-internals/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 20:46:16 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[Presentations]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=592</guid>
		<description><![CDATA[Abstract In this session we will examine one of SQL Servers' most fundamental yet most important functions - how does SQL Server store data on disk and how does it find it again when executing a query. Using a combination of demos, lecture and discussions we will begin with the structure of the data on <a href="http://sqlity.net/en/592/presentation-materials-index-internals/#more-592'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<h3>Abstract</h3>
<p>
In this session we will examine one of SQL Servers' most fundamental yet most important functions - how does SQL Server store data on disk and how does it find it again when executing a query. Using a combination of demos, lecture and discussions we will begin with the structure of the data on disk by delving into the world of HoBTs and Extends, RIDs and Keys and especially Pages. From there we will tackle topics concerning the retrieval of data such as Index levels, Index Scans and Seeks and Lookups.
</p>

<h3>Downloads</h3>
<ul>
 <li><a href="http://downloads.sqlity.net/IndexInternalsHandout.pdf">Slides</a></li>
 <li><a href="http://downloads.sqlity.net/TableInfo_IndexInfo.sql">TableInfo &amp; IndexInfo</a></li>
</ul>

<h3>Other Resources</h3>
<ul>
 <li> <a href="http://www.sqlsentry.com/plan-explorer/sql-server-query-view.asp" target="_blank">SQL Sentry Plan Explorer</a></li>
 <li><a target="_blank" href="http://www.amazon.com/gp/product/0735626243/ref=as_li_ss_tl?ie=UTF8&amp;tag=sqlitynet-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0735626243" id="static_txt_preview">Microsoft® SQL Server® 2008 Internals (Pro - Developer)</a></li>
</ul><img src="http://feeds.feedburner.com/~r/sqlitynet/~4/VFAT6Xe7GFM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/592/presentation-materials-index-internals/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/592/presentation-materials-index-internals/</feedburner:origLink></item>
		<item>
		<title>Selecting the entire Database as XML String</title>
		<link>http://feedproxy.google.com/~r/sqlitynet/~3/w5DKJ5BZg5Q/</link>
		<comments>http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 17:09:15 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=577</guid>
		<description><![CDATA[I just ran into this question on StackOverflow: Dump data into single XML file from MS SQL Server 2008 R2. In there Daniel is looking for a solution to export his database as an XML file. I thought that request was an excellent way to practice my XML skills so I spend a little time <a href="http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/#more-577'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<p>
I just ran into this question on StackOverflow: <a href="http://stackoverflow.com/questions/8851638/dump-data-into-single-xml-file-from-ms-sql-server-2008-r2">Dump data into single XML file from MS SQL Server 2008 R2</a>.
In there Daniel is looking for a solution to export his database as an XML file.
</p>
<p>
I thought that request was an excellent way to practice my XML skills so I spend a little time to come up with a solution.
</p>
<p>
For testing purposes I first created a few tables in an empty database:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">

DECLARE @cmd NVARCHAR(MAX) ;
SET @cmd = ( SELECT 'CREATE TABLE T' + CAST(n AS NVARCHAR(MAX))
                    + '(id INT IDENTITY(1,1)'
                    + (SELECT   ',c' + CAST(c.n AS NVARCHAR(MAX))
                                + ' INT DEFAULT ' + CAST(c.n AS NVARCHAR(MAX))
                                + '*' + CAST(t.n AS NVARCHAR(MAX))
                       FROM     dbo.GetNums(( t.n - 1 ) % 10 + 1) c
             FOR    XML PATH('') ,
                        TYPE).value('.', 'NVARCHAR(MAX)') + ');INSERT INTO T'
                    + CAST(n AS NVARCHAR(MAX)) + ' DEFAULT VALUES;'
             FROM   dbo.GetNums (3000) t
FOR XML PATH('') ,
        TYPE).value('.', 'NVARCHAR(MAX)') ;

EXEC(@cmd) ;
</pre>
</div>
<p>
This snippet uses <a href="http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers">Itzik's GetNums</a> function to generate 3000 CREATE TABLE and INSERT statements.
</p>
<p>
With the table in place I was able to come up with this solution to the original question:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
DECLARE @cmd NVARCHAR(MAX) ;
SET @cmd = 'SELECT * FROM('
    + STUFF(
(SELECT ' UNION ALL SELECT ''' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
        + '.' + QUOTENAME(name) + ''' [@name],' + '(SELECT * FROM '
        + QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + '.' + QUOTENAME(name)
        + ' FOR XML PATH(''row''),TYPE) data'
 FROM   sys.tables
    FOR     XML PATH('') ,
                TYPE).value('.', 'NVARCHAR(MAX)'), 1, 11, '')
    + ')X FOR XML PATH(''table''),TYPE;' ;

EXEC(@cmd) ;
</pre>
</div>
<p>
The script builds a single statement. For each table it selects the table name in a column with the name [@name] and the content of that table as XML (using a sub-select) in a column with the name data. All these SELECTs are "UNION ALL"ed together and formated into the final XML document with an additional outer SELECT FOR XML.
</p>
<p>
If you are only after some of the tables, you can add a where clause in row 8 of the above snippet.
</p>
<p>
The resulting XML has this format:
</p>
<div>
<pre class="brush: xml; title: ; notranslate">
&lt;table name=&quot;[dbo].[T1628]&quot;&gt;
  &lt;data&gt;
    &lt;row&gt;
      &lt;id&gt;1&lt;/id&gt;
      &lt;c1&gt;1628&lt;/c1&gt;
      &lt;c2&gt;3256&lt;/c2&gt;
      &lt;c3&gt;4884&lt;/c3&gt;
      &lt;c4&gt;6512&lt;/c4&gt;
      &lt;c5&gt;8140&lt;/c5&gt;
      &lt;c6&gt;9768&lt;/c6&gt;
      &lt;c7&gt;11396&lt;/c7&gt;
      &lt;c8&gt;13024&lt;/c8&gt;
    &lt;/row&gt;
  &lt;/data&gt;
&lt;/table&gt;
&lt;table name=&quot;[dbo].[T2102]&quot;&gt;
  &lt;data&gt;
    &lt;row&gt;
      &lt;id&gt;1&lt;/id&gt;
      &lt;c1&gt;2102&lt;/c1&gt;
      &lt;c2&gt;4204&lt;/c2&gt;
    &lt;/row&gt;
  &lt;/data&gt;
&lt;/table&gt;
</pre>
</div>
<p>
For each table there is a &lt;table&gt; tag that has a name attribut with the table name. It contains a &lt;data&gt; tag that in turn contains one &lt;row&gt; tag for every row in the table.
</p>
<p>
The only thing missing to make this valid XML is a root tag, but that should be easy to add.<img src="http://feeds.feedburner.com/~r/sqlitynet/~4/w5DKJ5BZg5Q" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://sqlity.net/en/577/selecting-the-entire-database-as-xml-string/</feedburner:origLink></item>
	</channel>
</rss>

