Wednesday, November 24, 2010

Grid, Cloud, HPC … What’s the Diff?

It’s always nice when another piece of the puzzle comes into focus.  In this case, my time speaking at the first ever International Super Computer (ISC) Cloud Conference the week before last was well spent.  The conference was heavily attended by those out of the grid computing space and I learned a lot about both cloud and grid.  In particular, I think I finally understand what causes some to view grid as a pre-cursor to cloud while others view it as a different beast only tangentially related.
This really comes down to a particular TLA in use to describe grid: High Performance Computing or HPC.  HPC and grid are commonly used interchangeably.  Cloud is not HPC, although now it can certainly support some HPC workloads, née Amazon’s EC2 HPC offering.  No, cloud is something a little bit different:  High Scalability Computing or simply HSC here.
Let me explain in some depth …

Scalability vs. Performance
First it’s critical for readers to understand the fundamental difference between scalability and performance.  While the two are frequently conflated, they are quite different.  Performance is the capability of particular component to provide a certain amount of capacity, throughput, or ‘yield’.  Scalability, in contrast, is about the ability of a system to expand to meet demand.  This is quite frequently measured by looking at the aggregate performance of the individual components of a particular system and how they function over time.
Put more simply, performance measures the capability of a single part of a large system while scalability measures the ability of a large system to grow to meet growing demand.
Scalable systems may have individual parts that are relatively low performing.  I have heard that the Amazon.com retail website’s web servers went from 300 transactions per second (TPS) to a mere 3 TPS each after moving to a more scalable architecture.  The upside is that while every web server might have lower individual performance, the overall system became significantly more scalable and new web servers could be added ad infinitum.
High performing systems on the other hand focus on eking out every ounce of resource from a particular component, rather than focusing on the big picture.  One might have high performance systems in a very scalable system or not.
For most purposes, scalability and performance are orthogonal, but many either equate them or believe that one breeds the other.

Grid & High Performance Computing
The origins of HPC/Grid exist within the academic community where needs arose to crunch large data sets very early on.  Think satellite data, genomics, nuclear physics, etc.  Grid, effectively, has been around since the beginning of the enterprise computing era, when it became easier for academic research institutions to move away from large mainframe-style supercomputers (e.g. Cray, Sequent) towards a more scale-out model using lots of relatively inexpensive x86 hardware in large clusters.  The emphasis here on *relatively*.
Most x86 clusters today are built out for very high performance *and* scalability, but with a particular focus on performance of individual components (servers) and the interconnect network for reasons that I will explain below.  The price/performance of the overall system is not as important as aggregate throughput of the entire system.  Most academic institutions build out a grid to the full budget they have attempting to eke out every ounce of performance in each component.

Read more: cloud scaling