Wednesday, January 19, 2011

Wrappers Unwrapped

When you use a COM object in .NET the .NET framework creates a Runtime Callable Wrapper (RCW) which shows up in the debugger as __ComObject. This allows interoperation between managed .NET and native COM code. One of the major differences between C++ COM and .NET is that in C++ you have deterministic destruction while in .NET objects are finalized at the garbage collector’s leisure. Which mode of operation is preferable is a matter of taste and not the topic of this post (I’ll just say that you can take my deterministic destruction when my cold dead hand goes out of scope) however mixing the two modes can cause problems.

The SOP is that the RCW calls AddRef[1] on the COM object it is holding so that it won’t disappear while it’s being used and then, when the RCW is collected by the GC, it will release this reference in its finalizer. This works if you don’t care exactly when the COM object is released and .NET exposes Marshal.ReleaseComObject for cases in which you want to manually control the object’s lifetime.
We recently re-implemented a few interfaces (that were previously native COM) in .NET and stuff stopped working. A bit of debugging later I find that we’re calling ReleaseComObject on an object that is now implemented in .NET, this causes an ArgumentException to be thrown. A cow-orker pointed me at a post titled Marshal.ReleaseComObject Considered Dangerous (well at least it’s not harmful), this post describes the problem we were facing and the solution I already put in place (check with Marshal.IsComObject before calling ReleaseComObject) but goes on saying that this isn’t enough since once you call ReleaseComObject the RCW is disconnected from the COM object and can’t be used (and that this can be a problem if the RCW is cached). I thought that this is a theoretical problem (I know we aren’t caching the RCWs) and kept trying to figure out why our code still wasn’t working.
It turns out that as in everything in programming it’s all about adding another level.  We had three levels, a .NET class (which we’ll call A) a native class (C) and class B which was native in a previous incarnation but is now .NET, A holds a reference to B which hands it a C (via a method or property), but why is this a problem?

For that we’ll need to dive a bit deeper into how RCWs work.
In order to preserve COM’s identity semantics whenever a COM object is passed into the CLR the .NET framework first checks if there is an active RCW for this object, if so it returns the existing object. You can see this by debugging your native object, after an object is returned from a native COM method (or property) it is followed by a QueryInterface for IUnknown and then by two calls to Release (one for the reference gained in the original function and one for the reference gained in QueryInterface). Additionally the RCW maintains a count of the times it has been reused, MSDN refers to this as a reference count but I’ll refer to it as marshal count in order to emphasise the difference between this number and the native COM object’s reference count. If an RCW is reused at the identity checking phase the marshal count is incremented. When ReleaseComObject is called on an RCW the marshal count is decremented. If the marshal count reaches zero the RCW releases its references to the native COM object and marks itself as invalid, any subsequent calls to the wrapped object will throw an InvalidComObjectException with the message “COM object that has been separated from its underlying RCW cannot be used”. If an RCW is collected by the GC while it is still in a valid state it will also release it’s wrapped object (regardless to its marshal count).

Now consider the situation we had when B was a native COM object. Whenever A got a C from B said C had crossed the CLR boundary and a new RCW was created (or at least the marshal count was incremented). Then when A was done with C it called ReleaseComObject and all was fine and dandy.

Read more:  I will not buy this blog, it is scratched!