Tuesday, October 09, 2012

Fun with Constrained Programming

Believe it or not, RAR files can contain bytecode for a simple x86-like virtual machine called the RarVM. This is designed to provide filters (preprocessors) to perform some reversible transformation on input data to increase redundancy, and thus improve compression.

For example, one filter (likely inspired by LZX, an earlier scheme with a similar feature) is called "Intel E8 preprocessing", which is designed to increase redundancy in x86 code.

If you imagine a program like this:

mov   foo, bar 
cmp   bar, baz
push  foo 
call  write
push  bar
call  write

The two calls are relative to the branch location, so there will be two different encodings of the same instruction. At compress time, we can translate them to absolute addresses. This way, the same instruction appears twice, and can therefore be compressed much more efficiently. When the archive is decompressed, this transformation can be reversed to restore the original.

WinRAR includes around a dozen standard filters that improve compression of several common inputs, but surprisingly also allows new filters to be defined at runtime by archives!

As far as I'm aware, no tool exists to explore this functionality. This is just too tempting to play with, so I've written some. So far, I have a RAR toolchain working, as well as some documentation on how to write programs.

You can checkout the code and some example programs on github, https://github.com/taviso/rarvmtools.

Read more: Tavis Ormandy
QR: Inline image 1

Posted via email from Jasper-Net