Finding bad queries is a big part of optimization. A scientific optimization process can be simplified to “can anything be improved for less than it costs not to improve it? – if not, we’re done.” In databases, we care most about the work the database is doing. That is, queries. There are other things we care about, but not as much as queries. The key here is that the question “can anything be improved” requires a process for identifying queries that can be improved. The easiest way to do that is to look for things that indicate sub-optimality.
All of the above might seem to be obvious, but I wanted to frame the rest of this post with a logical starting point. Next let’s see what kinds of things about queries might indicate that they aren’t optimal.
A responsible approach to a task such as finding bad queries begins with a definition of badness that we can be pretty sure is complete. I am interested in knowing whether I’m missing anything, but I believe the definition of badness in this article is fairly complete. (Post comments if I’m wrong.)
Here is my definition:
All of the above might seem to be obvious, but I wanted to frame the rest of this post with a logical starting point. Next let’s see what kinds of things about queries might indicate that they aren’t optimal.
A responsible approach to a task such as finding bad queries begins with a definition of badness that we can be pretty sure is complete. I am interested in knowing whether I’m missing anything, but I believe the definition of badness in this article is fairly complete. (Post comments if I’m wrong.)
Here is my definition:
- Queries are bad either singly (an individual query is bad) or in aggregate (a group or class of queries is bad). I think this is a complete definition because the only other kind of number besides “one” and “many” is “zero”, and a nonexistent query can’t be bad.
- Queries, or groups of queries, are bad because
- they are slow and provide a bad user experience, or
- they add too much load to the system, or
- they block other queries from running
- It finds queries (or groups of queries) that are slow and provide a bad user experience by the parameters to the –outliers option. If a query is slower than X more than N times, it’s bad. You can look at factors such as the variance-to-mean ratio, the Apdex score, and the explain sparkline to see whether the query is likely to be possible to optimize.
This last three-part definition is where I can’t be sure that I’ve got a complete definition.
Now, given the definition above, how do we find these queries in some input such as a log of queries? It shouldn’t surprise you that there is a working implementation of most of this in a tool, Maatkit’s mk-query-digest. Here’s how: