Thursday, May 26, 2011

C++ at Google: Here Be Dragons

Google has one of the largest monolithic C++ codebases in the world. We have thousands of engineers working on millions of lines of C++ code every day. To help keep the entire thing running and all these engineers fast and productive we have had to build some unique C++ tools, centering around the Clang C++ compiler. These help engineers understand their code and prevent bugs before they get to our production systems.

(Cross-posted on the Google Engineering Tools Blog)
Of course, improving the speed of Google engineers—their productivity—doesn’t always correlate to speed in the traditional sense. It requires the holistic acceleration of Google’s engineering efforts. Making any one tool faster just doesn’t cut it; the entire process has to be improved from end to end.
As a performance junkie, I like to think of this in familiar terms. It’s analogous to an algorithmic performance improvement. You get “algorithmic” improvements in productivity when you reduce the total work required for an engineer to get the job done, or fundamentally shift the time scale that the work requires. However, improving the time a single task requires often runs afoul of all the adages about performance tuning, 80/20 rules, and the pitfalls of over-optimizing.
One of the best ways to get these algorithmic improvements to productivity is to completely remove a set of tasks. Let’s take the task of triaging and debugging serious production bugs. If you’ve worked on a large software project, you’ve probably seen bugs which are somehow missed during code review, testing, and QA. When these bugs make it to production they cause a massive drain on developer productivity as the engineers cope with outages, data loss, and user complaints.
What if we could build a tool that would find these exact kinds of bugs in software automatically? What if we could prevent them from ever bringing down a server, reaching a user’s data, or causing a pager to go off? Many of these bugs boil down to simple C++ programming errors. Consider this snippet of code:


Response ProcessRequest(Widget foo, Whatsit bar, bool *charge_acct) {
  // Do some fancy stuff...
  if (/* Detect a subscription user */) {
    charge_acct = false;
  }
  // Lots more fancy stuff...
}

Do you see the bug? Careful testing and code reviews catch these and other bugs constantly, but inevitably one will sneak through, because the code looks fine. It says that it shouldn’t charge the account right there, plain as day. Unfortunately, C++ insists that ‘false’ is the same as ‘0’ which can be a pointer just as easily as it can be a boolean flag. This code sets the pointer to NULL, and never touches the flag.

Humans aren’t good at spotting this type of devious typo, any more than humans are good at translating C++ code into machine instructions. We have tools to do that, and the tool of choice in this case is the compiler. Not just any compiler will do, because while the code above is one example of a bug, we need to teach our compiler to find lots of other examples. We also have to be careful to make certain that developers will act upon the information these tools provide. Within Google’s C++ codebase, that means we break the build for every compiler diagnostic, even warnings. We continually need to enhance our tools to find new bugs in new code based on new patterns, all while maintaining enough precision to immediately break the build and have high confidence that the code is wrong.

To address these issues we started a project at Google which is working with the LLVM Project to develop the Clang C++ compiler. We can rapidly add warnings to Clang and customize them to emit precise diagnostics about dangerous and potentially buggy constructs. Clang is designed as a collection of libraries with the express goal of supporting diverse tools and application uses. These libraries can be directly integrated into IDEs and commandline tools while still forming the core of the compiler itself.

Read more: LLVM PROJECT BLOG