Thursday, February 03, 2011

Ready... cancel... wait for it! (part 1)

One of the cardinal rules of the OVERLAPPED structure is the OVERLAPPED structure must remain valid until the I/O completes. The reason is that the OVERLAPPED structure is manipulated by address rather than by value.

The word complete here has a specific technical meaning. It doesn't mean "must remain valid until you are no longer interested in the result of the I/O." It means that the structure must remain valid until the I/O subsystem has signaled that the I/O operation is finally over, that there is nothing left to do, it has passed on: You have an ex-I/O operation.

Note that an I/O operation can complete successfully, or it can complete unsuccessfully. Completion is not the same as success.

A common mistake when performing overlapped I/O is issuing a cancel and immediately freeing the OVERLAPPED structure. For example:

// this code is wrong
HANDLE h = ...; // handle to file opened as FILE_FLAG_OVERLAPPED
OVERLAPPED o;
BYTE buffer[1024];
InitializeOverlapped(&o); // creates the event etc
if (ReadFile(h, buffer, sizeof(buffer), NULL, &o) ||
    GetLastError() == ERROR_IO_PENDING) {
 if (WaitForSingleObject(o.hEvent, 1000) != WAIT_OBJECT_0) {
  // took longer than 1 second - cancel it and give up
  CancelIo(h);
  return WAIT_TIMEOUT;
 }
 ... use the results ...
}
...

The bug here is that after calling Cancel­Io, the function returns without waiting for the Read­File to complete. Returning from the function implicitly frees the automatic variable o. When the Read­File finally completes, the I/O system is now writing to stack memory that has been freed and is probably being reused by another function. The result is impossible to debug: First of all, it's a race condition between your code and the I/O subsystem, and breaking into the debugger doesn't stop the I/O subsystem. If you step through the code, you don't see the corruption, because the I/O completes while you're broken into the debugger.

Read more: The old new thing