Table of contents
Introduction
(Buzzword mode ON.)
Unix filesystems (and, by extension, filesystems everywhere) follow a tree-like structure, and consequently have peculiar (but necessary) restrictions on their semantics; because of that, they employ (i.e. they can employ) a reference-counting algorithm (keeping track of “inode link counts”) to detect when resources can be freed.
Unix has had “hard links” (i.e. multiply-referenced inodes) ever since the beginning. However, as part of the above-mentioned restrictions, hard links cannot be used on directories (actually, this limitation did not exist in the earliest versions of Unix, say, V5; but it appeared as early as V7). Berkeley Unix introduced symbolic links as a solution to this problem. Indeed, since symbolic links are asymmetric, and do not participate in the determination of what constitutes garbage (in GC terms they are “weak pointers”; actually, they are just names), their presence does not break the tree-like nature of the filesystem. But symbolic links are not entirely satisfactory either, precisely because of their asymmetric nature (or, rather, they are completely satifactory, but only insofar as the filesystem is fixed — when we start moving or removing things, it becomes obvious that symbolic links are not perfect, “dangling symlinks” being just one small aspect of this imperfection). Hard links, on the other hand, are elegant but practically unusable because they are too limited: they cannot be used on directories, and they cannot span across devices. Because of this they have appeared as an odd and isolated feature of Unix (and also the cause of many security problems because sysadmins are not too aware of them or forget about their existence: for example a recursive chown on a user's home dir is a dangerous thing since the user might have linked /etc/passwd into his home).
The GCFS project would relax these restrictions on the filesystem (that is, would provide a filesystem with more flexible semantics). This has advantages and drawbacks, of course, but the GCFS is more intended as an experiment with Unix filesystem semantics and Linux kernel hacking (HINT: c-o-o-l) than as a stable, usable, working filesystem. If this is not sufficiently clear, read: the GCFS is likely to turn your partition, or the whole Universe, into a thick goo, and make Asmodeus and all his legions flow through your nostrils; but the whole point is that this is fun.
In short, the GCFS would make the following extensions to the Unix filesystem semantics:
Permit hard links between directories. This does away with the treelike nature of the filesystem.
Permit moving (or linking) directories to subdirectories of themselves. This introduces cycles in the structure.
Permit unlinking a non empty directory. The files and directories it contains will then be garbage-collected if this makes them inacessible.
For a more detailed description of the desired semantics, see the “Semantics' description” section below.
Semantics' description
This section describes the desired semantics of the garbage collected file system without worrying about either the implementation details, or the relation to Linux.
Informal discussion and examples
Simple examples
The most important improvement I suggest over traditional semantics is to permit hard links between directories. Two hard linked directories (but in fact, I should be speaking of “two names for the same directory”) always contain exactly the same files. Here is a simple example:
computer luser ~ $ mkdir /tmp/foo
computer luser ~ $ cd /tmp/foo
computer luser /tmp/foo $ mkdir bar
computer luser /tmp/foo $ ls -li
total 1
17291 drwxrwxr-x 2 luser luser 1024 Mar 32 14:29 bar
computer luser /tmp/foo $ ln bar baz
computer luser /tmp/foo $ ls -li
total 2
17291 drwxrwxr-x 3 luser luser 1024 Mar 32 14:29 bar
17291 drwxrwxr-x 3 luser luser 1024 Mar 32 14:29 baz
computer luser /tmp/foo $ cd bar
computer luser /tmp/foo/bar $ touch qux
Read more: DZone