Monday, July 18, 2011

Build in the Cloud: Accessing Source Code

image01.png


This is the first in a four part series describing how we use the cloud to scale building and testing of software at Google. This series elaborates on a presentation given during the Pre-GTAC 2010 event in Hyderabad. Please see our first post for more details on the types of problems we are solving in Engineering Tools at Google.

Much of our day-to-day activities as software engineers involves source code. When we join a project one of the first things we do is look at the source. We want to build it, run it, experiment with changes, test it, and challenge our assumptions about how it works. For most of us this means we start by “checking out” the source from version control. For small to moderately sized projects almost any reasonable version control system is adequate. But as the number of engineers increases and the code base grows, this can put a strain on the version control system and decrease engineer productivity.

Here at Google, all products are built from head. This approach has advantages: the code is open for anyone to explore and tinker with, it avoids the headaches associated with merging long-lived branches, and building from source ensures there are no binary compatibility issues between libraries. The downside is, with over a hundred million lines of code, it takes a long time to check out. And Google is a global company, which means checkout times are amplified in distributed offices. By computing dependency graphs and using this information to limit the number of files checked out, we have been somewhat successful in reducing checkout time. However, computing dependencies also takes time, and even with this improvement things still took too long.

Engineer time lost to checking out code is the most obvious cost, but the true cost is much higher. Automated build and testing systems also need access to source code. Time spent checking out code in these systems increases the feedback cycle, which decreases their utility. It also increases the complexity of these systems since they are required to maintain state on a file system and interact closely with the version control system for what is essentially read-only access to source code.

In fact, we have found that engineers check out and edit a very small amount code relative to the amount read to perform builds. This is because we always build from source, and changes tend to be localized to a small part of our source tree. So, both engineers and automated systems primarily need quick, read-only access to the large quantity of unedited code required to perform their builds. The unedited code itself is immutable, since it doesn’t change once it’s checked in to the version control system. This means we can use Google infrastructure to mirror all version control information in the cloud as a way to provide fast and scalable read-only access to source code.


Read more: Google Engineering Tools blog
QR: build-in-cloud-accessing-source-code.html

Posted via email from Jasper-Net