Tuesday, March 22, 2011

Indexes in MS SQL Server

Introduction
I bought a book from Crossword; he packed the book and added two bookmarks into my pack. A thought came to my mind. Why do we need this bookmark? I can easily memorize the page number and next time resume from the same page when I resume reading, or read them all over to reach to the point where I stopped reading. But not all have a blessed memory; moreover, there are better things to remember, my grand pa would rather bookmark and rely on it to help him resume reading. It’s a kind of simple index, isn’t it?

This article focuses on how MS SQL Server uses indexes to read and write data. Logically, data is stored in record sets in a table. We have fields identifying the type of data contained in each of the record sets. Tables are a collection of record sets which are either in the form of unorganized heaps or organized clustered index. By default, tables are stored in the form of heaps where the next inserted record is simply added in the next available space on the table page. Data is arranged by SQL Server in the form of extents and pages. Each extent is of size 64 KB having 8 pages of 8KB sizes. An extent may have data of multiple or same table, but each page holds data from a single table only.

So resuming with the discussion, each inserted record by default is added to the next available row into the data page. And there is no attempt to keep the data organized or sorted by default. This option seems excellent for adding data to the table, but does not provide an optimum solution when there is an attempt to retrieve the data. Suppose in a library the books are organized simply in the order they are received. There is no sort upon author, genre, or title. There would be no problem in adding or storing the books into our library. But how about getting a book for issue? Bizarre, isn’t it? This would lead to a full scan of all the books to get the required book. A tough time is guaranteed.

This is exactly how SQL Server works too. Here’s when the index chips in.
Note: All code has been tested on MS SQL Server 2008 R2.

Table Indexes

A SQL Server table by default stores data as heaps. A heap is a table that does not have any clustered index defined on it. A table stored as a heap has no enforced physical order, but a clustered index does. Data is inserted into the heap table as described in the library example.
Heaps work very well for storing data, and are very efficient in handling new records, but they are not so great when it comes to finding specific data in a table. This is where indexes come in. SQL Server supports two basic types of indexes: clustered and non-clustered. It also supports XML indexes, which is not discussed in this article; XML indexes are quite different from the regular relational indexes that will be used to locate the majority of data in database tables.
The key difference between clustered and non-clustered indexes is the leaf level of the index. In non-clustered indexes, the leaf level contains pointers to the data. In a clustered index, the leaf level of the index is the actual data.

Read more: Codeproject