Tuesday, August 03, 2010

I Want A Debugger Robot

Hi,
My name is Sabin from the Platforms Global Escalation Services team at Microsoft, and today I want to share with you a recent experience I had debugging an issue reported by an hardware manufacturer.
The customer was doing a reboot test for their new server product line. They found that after hundreds of continuous reboots there was always a single instance that the server took more than 20 minutes to start up, when compared to an average 2 minute normal startup time. This happened only once every 300+ to 1000+ reboots. The number of reboots it took before the problem happened again varied randomly so it was difficult to predict when the problem would occur.
Although they setup a live kernel debugging environment, they didn’t want to watch the computer screen for 10+ hours waiting for the problem to happen so they could manually hit Ctrl+Break in windbg. So instead they setup a video camera to film the computer screen 24x7, and they managed to find that when the “mysterious delay” happened the computer showed a gray screen with “Microsoft (R) Windows (R) version 5.1 (Build 3790: Service Pack 2)”.
The case came to me and the customer even shipped a problematic server to our office to troubleshoot the cause of the delay. The problem was that I didn’t want to stare at the computer screen for 10+ hours either!
The first thing I thought was that it would be fantastic if there were a robot sitting in front of Windbg, watching the elapsed time for each reboot, so it could hit Ctrl+Break in windbg if the server takes more than 10 minute to start. Then I asked myself, “Why not?”
I decided to build such a “robot” myself.  I went around and checked the Debuggers SDK document (which can be found in the windbg help document debugger.chm), and I realized that what I needed was a customized debugger. The functionality of the debugger is simple, it should be able to recognize the time when the server first starts and the time when the server reboots. If there is more than 10 minutes between these two times the customized debugger automatically breaks in to the server. The event callback interface IDebugEventCallbacks::SessionStatus and the client interface IDebugControl::SetInterrupt can meet my needs perfectly.
It is not that difficult to build such a customized debugger, which I called DBGRobot. I would like to share some code snippets which you may find helpful when building a customized debugger for a special debugging scenario, or as the basis for building a more complicated debugging robot.
First, we need to download and install the Windows Driver Kit Version 7.1.0. When installing the WDK be sure to select Debugging Tools for Windows.
http://www.microsoft.com/whdc/DevTools/WDK/WDKpkg.mspx
If you install the WDK to its default folder, which for version 7.1.0 is C:\WinDDK\7600.16385.1, the C:\WinDDK\7600.16385.1\Debuggers\sdk\samples folder will contain the sample code from the Debugger SDK. The dumpstk sample is the one particularly interesting to us. We can copy some common code from it, such as the out.cpp and out.hpp which is the implementation of the IDebugOutputCallbacks interface.
Now let’s do some coding.  The common code is copied from the Debuggers SDK sample Dumpstk. I also listed it here for clarity.
The first step is to create the IDebugClient, IDebugControl and IDebugSymbols interfaces (although IDebugSymbols is not used in this case). You need to call the DebugCreate() function to create the IDebugClient interface, and then use IDebugClient->QueryInterface() to query the IDebugControl and IDebugSymbols interfaces.

void
CreateInterfaces(void)
{
   HRESULT Status;
   // Start things off by getting an initial interface from
   // the engine.  This can be any engine interface but is
   // generally IDebugClient as the client interface is
   // where sessions are started.
   if ((Status = DebugCreate(__uuidof(IDebugClient),
                             (void**)&g_Client)) != S_OK)
   {
       Exit(1, "DebugCreate failed, 0x%X\n", Status);
   }
   // Query for some other interfaces that we'll need.
   if ((Status = g_Client->QueryInterface(__uuidof(IDebugControl),
                                          (void**)&g_Control)) != S_OK ||
       (Status = g_Client->QueryInterface(__uuidof(IDebugSymbols),
                                          (void**)&g_Symbols)) != S_OK)
   {
       Exit(1, "QueryInterface failed, 0x%X\n", Status);
   }
}
If you want to see the output from the debugging engine, you also need to implement the IDebugOutputCallbacks interface. The main function to be implemented is IDebugOutputCallbacks::Output(), which is quite simple as we only need to see the output in the command prompt stdout stream:
STDMETHODIMP
StdioOutputCallbacks::Output(
   THIS_
   IN ULONG Mask,
   IN PCSTR Text
   )
{
   UNREFERENCED_PARAMETER(Mask);
   fputs(Text, stdout);
   return S_OK;
}
Here comes our main code logic: we need to implement the IDebugEventCallbacks interface and monitor the SessionStatus events. In order for the debugger engine to deliver the SessionStatus events to us we need to set the DEBUG_EVENT_SESSION_STATUS mask in IDebugEventCallbacks::GetInterestMask():
Read more: netdebugging