Monday, December 24, 2012

Linq to Collections: Beyond IEnumerable

The latest version of the .Net framework (4.5) includes several new interfaces representing readonly collections: IReadOnlyCollection<T>, IReadOnlyList<T>, and IReadOnlyDictionary<T>. In this post I’ll be explaining how to use these types to improve the ease-of-reasoning-about and efficiency (particularly with respect to memory usage).

The methods and types I’ll be describing are available as a software library (called “LinqToCollections”, in the public domain). You can view the library’s source code on GitHub or directly reference its NuGet package.

Before I start talking about really taking advantage of the new readonly collection interfaces, I want to justify why it’s worth bothering with them in the first place. Why not just use the array or IList<T> types? Why not just use the more general IEnumerable<T> type? It’s worked in the past, so why not continue doing so?

First, it is better to ask for an IReadOnlyList<T>, as opposed to an IList<T> or an array, when you don’t need the ability to mutate. Doing so obeys the principle of asking only for what you need. If you don’t need the ability to mutate what you’re given, don’t ask for a thing with that ability. This increases the number of cases where your method can be used and makes reasoning about it easier. If you ask for an IReadOnlyList<T>, then a user (or automated tool) can trivially determine that their precious list won’t be mangled by your method. But, if you ask for an IList<T>, they have to dig through documentation, inspect the source code, or (more commonly) guess in order to make the same determination. For example, you’re less likely to mistakenly reverse the arguments to a memcpy-like method that asks for a readonly source because the wrong ordering can fail type checking.

Second, it is better to expose an IReadOnlyList<T>, instead of an IList<T> or an array, when you don’t allow the ability to mutate. This follows the principle of providing what you promise. If you return an IList<T> (that happens to be readonly at runtime) instead of an IReadOnlyList<T>, users will be mislead into trying to modify the result. For example, before following the link to the documentation, tell me whether Dictionary<K, V>.KeyCollection.Remove does the same thing as Dictionary<K, V>.Remove or else fails with a NotSupportedException. The operation makes sense, implying it should do the same thing is Dictionary.Remove, but it’s a bit odd to have a collection you can remove from but not add to, implying it should fail. This question wouldn’t even come up if KeyCollection implemented IReadOnlyCollection<T> instead of ICollection<T>.

Finally, asking for an IReadOnlyList<T>, instead of an IEnumerable<T> that you immediately transform into an IReadOnlyList<T>, is more efficient. You should be asking for what you actually need, so that callers that happen to already have it can avoid unnecessary transformation costs (asking for what you actually need also helps with unit testing). Note that the overload taking IEnumerable<T> can still exist, but it becomes a mere convenience method that delegates to the overload taking an IReadOnlyList<T>. A good example of this optimization is the Enumerable.Reverse method. Consider this slightly-simplified implementation:

public static IEnumerable<T> Reverse<T>(this IEnumerable<T> source) {
    var buffer = source.ToArray();
    for (int i = buffer.Length - 1; i >= 0; --i)
        yield return buffer[i];

Enumerable.Reverse has to make a copy of the input sequence, using linear time and consuming linear memory before it can even yield the first item! However, if callers have an IReadOnlyList<T>, the expensive step can be omitted altogether:

public static IEnumerable<T> Reverse<T>(this IReadOnlyList<T> source) {
    for (int i = source.Count - 1; i >= 0; --i)
        yield return source[i];

Asking for a list instead of an enumerable reduces both the memory usage and time-to-first-item from linear to constant. Calling this a big improvement is a bit of an understatement. This is not an isolated case either: the cost of lots of other operations drops from linear to constant when working on a list instead of an enumerable. For example: “pick a random item”, “skip the first/last N items”, “take the last N items”, “give me every N’th item”, “partition items into adjacent groups of size N”, not to mention things like binary search and graph traversal (when using an adjacency matrix).

QR: Inline image 1

Posted via email from Jasper-Net