Tuesday, December 21, 2010

Azul's Pauseless Garbage Collector

Summary

At the JavaOne 2010 conference in San Francisco, Gil Tene, CTO of Azul Systems, discusses their pauseless garbage collector. In this interview, he explains the pauseless collection algorithm.
Bill Venners: One problem with deploying Java applications that require a lot of memory is that the more memory they use, the longer garbage collection pauses can become. Can you explain how Azul's pauseless garbage collector works?
Gil Tene: At Azul, we've been building high performance Java virtual machines (JVM)s for the past eight years now. We started the company in 2002, and we aimed to solve several scalability and infrastructure problems for Java back then. An interesting thing is that garbage collection (GC) was probably number three or four on the list. It wasn't number one. If we're going to build big scalable Java systems obviously you have to solve garbage collection, because you can't scale without it. But that was a corollary not an upfront need.
We've been shipping a pauseless garbage collector for the last five years on our Vega systems. First it was a simple single generation garbage collector, then that was quickly followed by a generational pauseless collector for high throughput.
With the Zing platform, which we've just recently announced and will be starting to ship soon1, we've taken that entire stack, a stack that can deliver a pauseless collector and scaleable JVM into pretty much any OS, and brought it to a pure software product. We can do this now on top of commodity x86 hardware, on commodity hypervisers like VMware and KVM. Our JVM runs on Linux, Solaris, Windows, HP-UX, AIX, and pretty much any operating system out there. The JVM gets virtualized from that OS into our stack, which includes our own underlying operating environment that allows us to run this pauseless collection, scalable JVM.

Solving the hardest problem all the time

At the heart of all this is a Java Virtual Machine that has a lot of interesting features, one of which is a fundamentally different way of doing garbage collection. It's a collector that's designed to concurrently do everything that a collector needs to do, and also avoid doing anything that's rare. We didn't take the typical approach where you try and optimize for the common fast case, but remain stuck with some things that are really hard to do, which you push into the future. Then you tune and tune to make those events rare, maybe once every ten minutes or every hour—but they are going to happen. We took the opposite approach. We figured that to have a smooth, wide operating range and high scalability we pretty much have to solve the hardest problem all the time. If we do that well, then the rest doesn't matter. Our collector really does the only hard thing in garbage collection, but it does it all the time. It compacts the heap all the time and moves objects all the time, but it does it concurrently without stopping the application. That's the unique trick in it, I'd say, a trick that current commercial collectors in Java SE just don't do.

Read more: artima developer