Many experienced database professionals have acquired a somewhat jaded view of parallel query execution. Sometimes, this is a consequence of bad experiences with older versions of SQL Server. Just as frequently, however, this view is the result of misconceptions, or an otherwise incomplete mastery of the techniques required to effectively design and tune queries for parallel execution.
This is the first in a series of articles that will provide the reader with the deep knowledge necessary to make advanced use of the parallel query processing features available in Microsoft SQL Server. Part one provides a step-by-step guide to the fundamentals of parallelism in SQL Server, introducing concepts such as parallel scans and seeks, workers, threads, tasks, execution contexts, and the exchange operators that coordinate parallel activity.
Future instalments will provide further insights into the inner workings of the database engine, and show how targeted parallelism can benefit many real-world environments, not just the data warehousing and decision-support systems normally associated with its use. Systems that are often thought of as having a primarily transaction-processing (OLTP) workload often contain queries and procedures that could benefit from the appropriate use of parallelism.
Perhaps inevitably, this and subsequent instalments contain quite deep technical content in places. Making the most effective use of parallelism requires a good understanding of how things like scheduling, query optimization, and the execution engine really work. Nevertheless, it is hoped that even those who are completely new to the topic will find this series informative and useful.
What is Parallelism?
You have probably heard the phrase "many hands make light work". The idea is that splitting a task among a number of people results in each person doing less. From the individual's perspective, the job seems much easier, even though a similar amount of work is being done overall. More importantly, if the extra people can perform their allocation of work at the same time, the total time required for the task is reduced.
Counting Jelly Beans
Imagine you are presented with a large glass jar full of assorted jelly beans, and asked to count how many there are. Assuming you are able to count beans at an average rate of five per second, it would take you a little over ten minutes to determine that this particular jar contains 3,027 jelly beans.
If four of your friends offer to help with the task, you could choose from a number of potential strategies, but let's consider one that closely mirrors the sort of strategy that SQL Server would adopt. You seat your friends around a table with the jar at its centre, and a single scoop to remove beans from the jar. You ask them to help themselves to a scoop of beans whenever they need more to count. Each friend is also given a pen and a piece of paper, to keep a running total of the number of beans they have counted so far.
Read more: Simple-talk
This is the first in a series of articles that will provide the reader with the deep knowledge necessary to make advanced use of the parallel query processing features available in Microsoft SQL Server. Part one provides a step-by-step guide to the fundamentals of parallelism in SQL Server, introducing concepts such as parallel scans and seeks, workers, threads, tasks, execution contexts, and the exchange operators that coordinate parallel activity.
Future instalments will provide further insights into the inner workings of the database engine, and show how targeted parallelism can benefit many real-world environments, not just the data warehousing and decision-support systems normally associated with its use. Systems that are often thought of as having a primarily transaction-processing (OLTP) workload often contain queries and procedures that could benefit from the appropriate use of parallelism.
Perhaps inevitably, this and subsequent instalments contain quite deep technical content in places. Making the most effective use of parallelism requires a good understanding of how things like scheduling, query optimization, and the execution engine really work. Nevertheless, it is hoped that even those who are completely new to the topic will find this series informative and useful.
What is Parallelism?
You have probably heard the phrase "many hands make light work". The idea is that splitting a task among a number of people results in each person doing less. From the individual's perspective, the job seems much easier, even though a similar amount of work is being done overall. More importantly, if the extra people can perform their allocation of work at the same time, the total time required for the task is reduced.
Counting Jelly Beans
Imagine you are presented with a large glass jar full of assorted jelly beans, and asked to count how many there are. Assuming you are able to count beans at an average rate of five per second, it would take you a little over ten minutes to determine that this particular jar contains 3,027 jelly beans.
If four of your friends offer to help with the task, you could choose from a number of potential strategies, but let's consider one that closely mirrors the sort of strategy that SQL Server would adopt. You seat your friends around a table with the jar at its centre, and a single scoop to remove beans from the jar. You ask them to help themselves to a scoop of beans whenever they need more to count. Each friend is also given a pen and a piece of paper, to keep a running total of the number of beans they have counted so far.
Read more: Simple-talk