Thursday, May 20, 2010

The Glasgow Haskell Compiler and LLVM

If you read the LLVM 2.7 release notes carefully you would have noticed that one of the new external users is the Glasgow Haskell Compiler (GHC). As the author of the LLVM backend for GHC, I have been invited to write a post detailing the design of the backend and my experiences with using LLVM. This is that post :).

I began work on the backend around July last year, undertaking it as part of an honours thesis for my bachelor of Computer Science. Currently the backend is quite stable and capable on Linux x86, able to bootstrap GHC itself. Other platforms haven't received any attention yet.

What is GHC and Haskell

GHC is a compiler for Haskell, a standardized, lazy, functional programming language. Haskell supports features such as static typing with type inference, lazy evaluation, pattern matching, list comprehension, type classes and type polymorphism. GHC is the most popular Haskell compiler, it compiles Haskell to native code and is supported of X86, PowerPC and SPARC.

Existing pipeline

Before the LLVM backend was added, GHC already supported two backends, a C code generator and a native code generator (NCG).

The C code generator was the first backend implemented and it works pretty well but is slow and fragile due to its use of many GCC specific extensions and need to post processes the assembly code produced by GCC to implement optimisations which aren't possible to do in the C code. The native code generator was started later to avoid these problems. It is around 2-3x quicker than the C backend and generally reduces the runtime of a Haskell program by around 5%. GHC developers are hoping to depreciate the C backend in the next major release.

Why an LLVM backend?

Offload work: Building a high performance compiler backend is a huge amount of work, LLVM for example was started around 10 years ago. Going forward, the LLVM backend should be a lot less work to maintain and extend than either the C backend or NCG. It will also benefit from any future improvements to LLVM.
Optimisation passes: GHC does a great job of producing fast Haskell programs. However, there are a large number of lower level optimisations (particularly the kind that require machine specific knowledge) that it doesn't currently implement. Using LLVM should give us most of them for free.
The LLVM Framework: Perhaps the most appealing feature of LLVM is that it has been designed from the start to be a compiler framework. For researchers like the GHC developers, this is a great benefit and makes LLVM a very fun playground. For example, within a couple of days of the public release of the LLVM backend one developer, Don Stewart, wrote a genetic algorithm to find the best LLVM optimisation pipeline to use for various Haskell programs (you can find his blog post about this here).


Read more: LLVM PROJECT BLOG

Posted via email from jasper22's posterous