Monday, May 30, 2011

Anatomy of a .NET Assembly - Signature encodings

If you've just joined this series, I highly recommend you read the previous posts in this series, starting here, or at least these posts, covering the CLR metadata tables.
Before we look at custom attribute encoding, we first need to have a brief look at how signatures are encoded in an assembly in general.

Signature types
There are several types of signatures in an assembly, all of which share a common base representation, and are all stored as binary blobs in the #Blob heap, referenced by an offset from various metadata tables.

The types of signatures are:
Method definition and method reference signatures.
Field signatures
Property signatures
Method local variables. These are referenced from the StandAloneSig table, which is then referenced by method body headers.
Generic type specifications. These represent a particular instantiation of a generic type.
Generic method specifications. Similarly, these represent a particular instantiation of a generic method.
All these signatures share the same underlying mechanism to represent a type
Representing a type
All metadata signatures are based around the ELEMENT_TYPE structure. This assigns a number to each 'built-in' type in the framework; for example, Uint16 is 0x07, String is 0x0e, and Object is 0x1c. Byte codes are also used to indicate SzArrays, multi-dimensional arrays, custom types, and generic type and method variables. However, these require some further information.
Firstly, custom types (ie not one of the built-in types). These require you to specify the 4-byte TypeDefOrRef coded token after the CLASS (0x12) or VALUETYPE (0x11) element type. This 4-byte value is stored in a compressed format before being written out to disk (for more excruciating details, you can refer to the CLI specification).

Read more: Simple talk