Sunday, June 10, 2012

Create a working compiler with the LLVM framework, Part 1

   The LLVM (formerly the Low Level Virtual Machine) is an extremely powerful compiler infrastructure framework designed for compile-time, link-time, and run time optimizations of programs written in your favorite programming language. LLVM works on several different platforms, and its primary claim to fame is generating code that runs fast.

   The LLVM framework is built around a well-documented intermediate representation (IR) of code. This article—the first in a two-part series—delves into the basics of the LLVM IR and some of its subtleties. From there, you will build a code generator that can automate the work of generating the LLVM IR for you. Having an LLVM IR generator means that all you need is a front end for your favorite language to plug into, and you have a full flow (front-end parser + IR generator + LLVM back end). Creating a custom compiler just got simplified.

Getting started with the LLVM

Before you start, you must have the LLVM compiled on your development computer (see Resources for a link). The examples in this article are based on LLVM version 3.0. The two most important tools for post-build and installation of LLVM code are llc and lli.
llc and lli

Because LLVM is a virtual machine (VM), it likely should have its own intermediate byte code representation, right? Ultimately, you need to compile LLVM byte code into your platform-specific assembly language. Then you can run the assembly code through a native assembler and linker to generate executables, shared libraries, and so on. You use llc to convert LLVM byte code to platform-specific assembly code (see Resources for a link to more information about this tool). For directly executing portions of LLVM byte code, don't wait until the native executable crashes to figure out that you have a bug or two in your program. This is where lli comes in handy, as it can directly execute the byte code. lli performs this feat either through an interpreter or by using a just-in-time (JIT) compiler under the hood. See Resources for a link to more information about lli.

llvm-gcc

llvm-gcc is a modified version of the GNU Compiler Collection (gcc) that can generate LLVM byte code when run with the -S -emit-llvm options. You can then use lli to execute this generated byte code (also known as LLVM assembly). For more information about llvm-gcc, see Resources. If you don't have llvm-gcc preinstalled on your system, you should be able to build it from sources; see Resources for a link to the step-by-step guide.

Hello World with LLVM

To better understand LLVM, you have to learn LLVM IR and its idiosyncrasies. This process akin to learning yet another programming language. But if you have been through C and C++ and their quirks, there shouldn't be much to deter you in the LLVM IR. Listing 1 shows your first program, which prints "Hello World" in the console output. To compile this code, you use llvm-gcc.

Listing 1. The familiar-looking Hello World program

#include <stdio.h>
int main( )
  printf("Hello World!\n");
}

To compile the code, enter this command:

Tintin.local# llvm-gcc helloworld.cpp -S -emit-llvm 

After compilation, llvm-gcc generates the file helloworld.s, which you can execute using lli to print the message to console. The lli usage is:

Tintin.local# lli helloworld.s
Hello, World

Read more: IBM
QR: Inline image 1

Posted via email from Jasper-Net