Understanding Storage in SQL Server
Table of Contents
At this point, I want to take a moment to help you understand the database storage a bit more. I am not talking about RAIDs, SANs, or having the transaction log on a different drive either. What I mean is to talk about how the database actually stores the attributes (columns) and data records (rows).
Pages and Extents
To start off, I would like to dig a bit into pages and extents. The basic unit of data storage in SQL Server is the page. Disk I/O operations are performed at the page level, meaning that SQL Server must read or write whole pages. Pages make up 8 KB of data, which allows 128 pages for each MB allocated to SQL Server’s data files (MDF and LDF).
Below is an outline of a page with how it is made up.
[Diagram: SQL Server page structure — 96-byte header, data rows, 2-byte slot array]
Extents
Extents are a collection of eight physically sequential pages and are used to efficiently manage the pages. All pages are stored in extents. However, to make space allocation efficient, SQL Server does not allocate whole extents to tables at one time. This creates two different types of extents: mixed and uniform.
Until objects (tables, indexes, etc.) have enough pages to create a uniform extent, they are a mixed extent. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations.
Row Data Structures
Zooming in further, a row can have three types of data structures:
- LOB_DATA — Stored off-page when the data in the LOB column exceeds 8,060 bytes. This is only possible through certain data types:
VARCHAR(MAX)NVARCHAR(MAX)VARBINARY(MAX)XML- CLR user-defined types
- IN_ROW_DATA — Every row has a portion of its data stored in this structure, if not all of it.
- Cannot span pages
- All fixed-width columns must be stored in this structure
- Limited to 8,060 bytes due to the header (96 bytes), slot array (2 bytes), and reserved space for Microsoft (34 bytes)
- This means you cannot create 10
CHAR(1000)columns in the table, since these are fixed width - However, you can create 10
VARCHAR(1000)since they are variable width. But note that if you exceed 8,060 bytes in the table you will overflow into the structure below.
- ROW_OVERFLOW_DATA — This only happens with certain data types:
VARCHAR(x)NVARCHAR(x)VARBINARY(x)SQL_VARIANT- When this happens, it can cause two major issues:
- Poor performance — If the column overflows, you need an extra I/O to retrieve the column
- Data integrity issues — For example, a
VARCHAR(2000)column for an address allows users to input garbage data
Wrapping Up
Now that we got through this, I hope it gives you some basic insight into the data structures. In the following post, we will go over why this was important for data density reasons and other performance concerns.