.NET Information center: Decompiler Design

Thursday, April 07, 2011

Decompiler Design - Introduction

Introduction

Decompilation is a form of reverse engineering of computer programs. Its goal is to convert a compiled binary file into a source file. One may want to do this for several reasons, such as to understand how a program works, or to try to modify a program to enhance it or fix a bug.

Decompilation has been around for many years, probably ever since people started to compile programs from high-level languages to lower-level formats such as assembly and machine code.

Several attempts have been conducted at writing binary executable decompilers. This page has some examples.

There are even more decompilers available for managed environments that use byte-code, like Java and C#. An extensive list is available at the program transformation wiki.

In these pages we’ll focus on decompilation of binary executables, or from machine code to source code, since this is much more difficult than decompiling Java byte-code or C#.

Several languages and compilers can produce machine code, including some Java, C# and Visual Basic compilers. Therefore the decompiler will have to know which language was used to compile the program, and will have to have support to generate that language. However, most of the difficult problems in decompilation appear when using less restrictive languages, namely C, Pascal or C++.

Most of the algorithms can be used for all languages, so we will mostly use examples written in C. When we show an algorithm, we’ll use C or C++, since these are the most available languages on both Linux and Windows.