For some time now, I have been very interested in the compilation process of programming languages and how they are converted to assembly. Naturally, I became very interested in LLVM portable assembly syntax, and how its syntax compares to x86 assembly. It turns out that it is very readable and understandable, unlike the bare bones assembly found when disassembling a program. I found that it is actually a lot easier to analyze x86 assembly after it has been been generated from LLVM as intermediary since the LLVM block labels are commented into the assembly blocks and in general the structure looks very similar. Let us look at an example of a simple hello world program.
@0 = internal constant [20 x i8] c"Hello LLVM-C world!"
declare i32 @puts(i8*)
define void @sayHelloWorld() {
aName:
%0 = call i32 @puts(i8* getelementptr inbounds ([20 x i8]* @0, i32 0, i32 0))
ret void
}
Looking through the following example, the global @0 is set with a string. An external function where the body is defined somewhere else needs to be initialized with the declare keyword. A new function, however, needs to be defined along with its body, return type, and parameters. Each function body needs to have at least one label, which in this case is aName. After the block label, the call function is used which calls puts function from an external definition. The function getelementptr returns a pointer to an element specified with certain bounds. When the inbounds keyword is used, access is denied outside of the bounds specified. The register %0 is set with the result of external puts function, and a void is returned. Now I will present the equivalent x86 assembly.
.section __TEXT,__text,regular,pure_instructions
.globl _sayHelloWorld
.align 4, 0x90
_sayHelloWorld: ## @sayHelloWorld
Leh_func_begin0:
## BB#0: ## %aName
subq $8, %rsp
Ltmp0:
leaq ___unnamed_1(%rip), %rdi
callq _puts
addq $8, %rsp
ret
Leh_func_end0:
Read more: Planet of the Ecks
@0 = internal constant [20 x i8] c"Hello LLVM-C world!"
declare i32 @puts(i8*)
define void @sayHelloWorld() {
aName:
%0 = call i32 @puts(i8* getelementptr inbounds ([20 x i8]* @0, i32 0, i32 0))
ret void
}
Looking through the following example, the global @0 is set with a string. An external function where the body is defined somewhere else needs to be initialized with the declare keyword. A new function, however, needs to be defined along with its body, return type, and parameters. Each function body needs to have at least one label, which in this case is aName. After the block label, the call function is used which calls puts function from an external definition. The function getelementptr returns a pointer to an element specified with certain bounds. When the inbounds keyword is used, access is denied outside of the bounds specified. The register %0 is set with the result of external puts function, and a void is returned. Now I will present the equivalent x86 assembly.
.section __TEXT,__text,regular,pure_instructions
.globl _sayHelloWorld
.align 4, 0x90
_sayHelloWorld: ## @sayHelloWorld
Leh_func_begin0:
## BB#0: ## %aName
subq $8, %rsp
Ltmp0:
leaq ___unnamed_1(%rip), %rdi
callq _puts
addq $8, %rsp
ret
Leh_func_end0:
Read more: Planet of the Ecks