Understanding HEAP Tables

Zooming out further, we should take a look at the table structures. By this, I am of course talking about HEAPS and Clustered Tables. Though, the main focus of the post will be Heaps, it is hard to talk about them without the other.

The official definition for a heap “…is a table without a clustered index.” This has always been a wonky definition to me. Who defines something by saying it’s not something else? But, the reason they did it is because it is correct. This is the only difference between a heap and a clustered table.

To explain what a heap IS though, I find it easier to list the properties:

Made by using CREATE TABLE syntax without a clustering key
CAN have a primary key - with a non clustered index
Can have non-clustered indexes

So in reality, a heap is a table that does not store data in a specific order. Because of this, these heap tables can affect performance in two primary ways:

Full table scans
Forwarding rows

Full Table Scans

To examine any row in the table heaps have to perform a full table scan. This in itself should tell you that this will cause problems. These scans are S…L…O…W… due to the heavy amount of I/O reads required. This also causes a high amount of disk to memory swapping caused by SQL loading the table into memory.

Forwarding Rows

These ONLY exist in a heap table structure. Once the record outgrows the page, SQL Server moves the data to another page. It then leaves a Forwarding pointer/row at the original location to point to the new location. This entry can cause I/O increases that you wouldn’t expect. How the I/O problems arise:

The query you are running uses a record locator. This points to a page and the DB engine has to go to that page.
Upon arriving, the locator sees the forwarding pointer.
It then goes to the new page to get the forwarded row.

So why do we have them?

The main reason is that they do have their place. There are specific use cases where you would want them. Microsoft does a decent job at explaining them on their article here.

But again, in most cases they are not appropriate and will cause many problems in production. This is why by default SQL server will default the primary key to be your clustering key. (Not always the best either, will discuss on the next post)

Fun Part

So now that we have the definitions out of the way we can do the demo to see how bad they actually are.

First, let’s create a heap table and insert 1000000 rows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE dbo.HeapingHeap
(
Id BIGINT IDENTITY NOT NULL,
column1 VARCHAR(1024)
DEFAULT REPLICATE('A',200) NOT NULL
);
GO

INSERT INTO HeapingHeap DEFAULT VALUES
GO 1000000

So now that we have our table setup and we can see the data, let’s look at performance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON;
SELECT *
FROM HeapingHeap
GO
SET STATISTICS IO OFF;
GO

Here is what I received:

Table ‘HeapingHeap’. Scan count 1, logical reads 28572, physical reads 0, read-ahead reads 28550, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Lets also look at how many pages we have in this table with the query to analyze physical stats. With one key change, we need to add forwarding rows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
[alloc_unit_type_desc] AS [Data Structure]
, [page_count] AS [pages]
, [record_count] AS [Rows]
, [min_record_size_in_bytes] AS [min row]
, [max_record_size_in_bytes] AS [Max Row]
, [forwarded_record_count] AS [Forwarded Rows]
FROM SYS.dm_db_index_physical_stats
(DB_id()
,OBJECT_ID (N'HeapingHeap')
, NULL
, NULL
, N'detailed')

Here is what I received:

This makes sense, As the logical reads = the amount of pages. This is good, performance is what we would expect. But what happens if we change the data on the heap? Let’s update the table to have 50% of the rows now be a bigger column.

1
2
3
4
UPDATE dbo.HeapingHeap
SET column1=REPLICATE('Z',1000)
WHERE Id % 2 = 0;
GO

Again, let’s run our query for performance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON;
SELECT *
FROM HeapingHeap
GO
SET STATISTICS IO OFF;
GO

We now receive a new number:

Table ‘HeapingHeap’. Scan count 1, logical reads 485715, physical reads 7145, read-ahead reads 28543, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This is a huge performance issue, now we are hitting the disks harder. Yet, we are still returning the same count of rows. How did this happen? What caused it to change?

This was the act of the forwarding rows on the table. Let’s take a look at our physical table again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SELECT
[alloc_unit_type_desc] AS [Data Structure]
, [page_count] AS [pages]
, [record_count] AS [Rows]
, [min_record_size_in_bytes] AS [min row]
, [max_record_size_in_bytes] AS [Max Row]
, [forwarded_record_count] AS [Forwarded Rows]
FROM SYS.dm_db_index_physical_stats
(DB_id()
,OBJECT_ID (N'HeapingHeap')
, NULL
, NULL
, N'detailed')

If you notice, we now have 400,000 new forwarding rows that did not exist before. This is where our extra I/Os are coming from. Remember, as mentioned earlier you are causing SQL to go and fetch each one of these rows.

For us, the simple way to look at predicting the I/Os of this query is to use this equation.

Logical Reads = allocated pages + forwarded rows

How to fix this

To fix heaps there are two things to do:

Add a clustered index so the table won’t be a HEAP.
If you NEED the heap in your database, use ALTER TABLE tablename REBUILD to rebuild the table.

NOTE: People stumble because there is no easy way to identify the table is a heap. The only thing you can check for is the lack of clustering index. The best way to tell is to run a script on the database.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT SCH.name AS [Schema Name] ,
OBJ.name AS [Table_Name]
FROM sys.indexes INX
JOIN sys.objects OBJ
ON INX.object_id = OBJ.object_id
JOIN sys.schemas SCH
ON OBJ.schema_id = SCH.schema_id
WHERE OBJ.is_ms_shipped = 0 --filters SQL objects
AND INX.index_id = 0 --meaning it has none
AND OBJ.type = 'U'; --user table
GO

This gives us a view of all the tables inside your database that are heaps.

Understanding HEAP Tables

Full Table Scans

Forwarding Rows

So why do we have them?

Fun Part

How to fix this

Related Posts

Volume Shadow Copy Service (VSS)

Vertical Partitioning

Understanding Storage in SQL Server

Comments