Monday, June 13, 2011

C# use Zip archives without external libraries

Introduction
I found a lot of articles how to access Zip archives in C# but all with significant disadvantages. The main problem is that Microsoft has Zip archives implemented in the operating system but there is no official API what we can use. In C# for example, we have the System.IO.Compression.GZip but there is no adequate System.IO.Compression.Zip class.

There are some free .NET compression libraries like SharpZipLib and .NET Zip Library but this leads in additional installation effort and licensing problems.

It is also possible to use the free J# Library. J# has included Zip to keep compatible with the Java libraries. But to bundle a 3.6 MB DLL vjslib.dll, just to support Zip, seems like a really goofy hack.
Since .NET 3.0, we can use the the System.IO.Packaging ZipPackage class in WindowsBase.DLL. It's just 1.1 MB, and it just seems to fit a lot better than importing Java libraries.

Problem only that the ZipPackage class isn't a generic Zip implementation, it's a packaging library for formats like XPS and Office Open XML that happen to use Zip.

To access simple Zip archives with ZipPackage fails because the content is checked for Package conventions.

For example that there has to be a file [Content_Types].xml in the root and only files with specified extensions are accessible. Filenames with special characters and spaces not aloud and the access time is not the best because of the additional Package link logic.

However, the assembly WindowsBase.DLL is preinstalled and the generic Zip implementation is inside. The only problem is that the generic Zip classes are not public and visible for the programmers. But there is a simple way to get access to this hidden API and I wrote a small wrapper class for this.


Background

A quick check in the Object Browser shows us that WindowsBase.DLL has a namespace MS.Internal.IO.Zip This sounds good but there are no public classes visible.
However, the following call:

var types = typeof(System.IO.Packaging.Package).Assembly.GetTypes();

gives us 824 class types, public and non-public and especially one with the name MS.Internal.IO.Zip.ZipArchive. Now it is easy to get this special class type and the methodes and properties:

var type = typeof(System.IO.Packaging.Package).Assembly.GetType("MS.Internal.IO.Zip.ZipArchive");
var static_methodes = type.GetMethods(BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic);
var nostatic_methodes = type.GetMethods(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
and we get the most important methods:
static ZipArchive OpenOnFile(string path, FileMode mode, FileAccess access, FileShare share, bool streaming);
static ZipArchive OpenOnStream(Stream stream, FileMode mode, FileAccess access, bool streaming);
ZipFileInfo AddFile(string path, CompressionMethodEnum compmeth, DeflateOptionEnum option);
ZipFileInfo GetFile(string name);
ZipFileInfo DeleteFile(string name);
ZipFileInfoCollection GetFiles();
void Dispose();
The same procedure for ZipFileInfo and we get:
Stream GetStream(FileMode mode, FileAccess access);

Read more: Codeproject