Monday, February 08, 2010

Managing Sparse Files on Windows

This article focuses on how a developer can write functions that allows him or her to manipulate Sparse files from within a Windows application. Sparse files are managed in a special way by the file system, and typically can contain several regions of unallocated and\or zeroed out ranges. The file system cleverly optimizes the on-disk consumption of storage by only persisting the allocated regions and just keeping track of unallocated regions as well as Sparse regions (zeroed out regions) in meta data.

Background

Sparse diles are defined as files that contain large regions that do not have any data stored in them or are explicitly zeroed out. In case of a normal file, even though large regions are zeroed out, the file will still consume the same amount of space if there were any other valid data in those regions (non-zero). Using Sparse files, the user can tell the Operating System and the file system that this is a special file and the regions that are zeroed out are just empty spaces. File systems like NTFS will typically optimize the way they store a Sparse file, and just allocate space for the allocated regions within the file. For the ranges that need to be marked as sparse, the user needs to tell the file system to specially set zero (through an IOCTL) such that NTFS will make an internal update to its metadata, marking the range as sparse, and not allocating any extra space on disk for the extra zeroes. The requirement to use Sparse files is specific to an application, hence a sparse file is transparent to the application that uses it. Applications need to be aware of the APIs that allow it to query, manage, and manipulate a Sparse file. An application may decide to make a normal file as Sparse. This is allowed, but then, it's also the responsibility of the application to ensure that it scans the files for regions of zero that it needs to explicitly mark as Sparse. A normal file can be made Sparse by explicitly sending an IOCTL to the file system asking for the internal attribute of the file to be set Sparse.


Read more: Codeproject

Posted via email from jasper22's posterous