Wednesday, August 25, 2010

Database assisted denormalization

Let us say that I have the homepage of the application, where we display Blogs with their Post count, using the following query:
select
   dbo.Blogs.Id,
   dbo.Blogs.Title,
   dbo.Blogs.Subtitle,
   (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id) as PostCount
from dbo.Blogs
Given what I think thoughts of denormalization, and read vs. write costs, it seems a little wasteful to run the aggregate all the time.
I can always add a PostCount property to the Blogs table, but that would require me to manage that myself, and I thought that I might see whatever the database can do it for me.
This isn’t a conclusive post, it details what I tried, and what I think is happening, but it isn’t the end all be all. Moreover, I run my tests on SQL Server 2008 R2 only, not on anything else. I would like to hear what you think of this.
My first thought was to create this as a persisted computed column:
ALTER TABLE Blogs
ADD PostCount AS (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id) PERSISTED
But you can’t create computed columns that uses subqueries. I would understand easier why not if it was only for persisted computed columns, because that would give the database a hell of time figuring out when that computed column needs to be updated, but I am actually surprised that normal computed columns aren’t supporting subqueries.
Given that my first attempt failed, I decided to try to create a materialized view for the data that I needed. Materialized views in SQL Server are called indexed views, There are several things to note here. You can’t use subqueries here either (likely because the DB couldn’t figure which row in the index to update if you were using subqueries), but have to use joins.
Read more: Ayende @ Rahien