Thursday, May 20, 2010

THE CASE OF THE LOW HANGING FILTER DRIVER FRUIT

Not all our cases are crashes, leaks, or high CPU.  Sometimes the problems we are faced with are purely a question of why a given application runs slow on a particular version of Windows versus another version of windows.  In other cases an application may just start running slow for no reason.   OK, not likely.  There is ALWAYS SOME RESASON.  Something changed!  In this case, the customer reported that an application started running slow when booted into “Normal Mode”, but when the OS was booted in safe mode, the application would run fast.  In this particular case the customer reported that a given operation went from taking just a few seconds (safe mode) to several minutes (normal mode).   Further research found that the problem was related to accessing the registry and registry performance in general.  At this point I’m already thinking, “Registry Access?” and “Safe Mode”.  What could affect registry access that does not run in safe mode?  Well lots of services DO NOT start in safe mode.  What kind of services could affect registry calls?  Antivirus?  Maybe…  Let’s look deeper.

One of the first things I typically do in such cases is to ask for a kernrate log of the slow and fast scenario. http://download.microsoft.com/download/8/e/c/8ec3a7d8-05b4-440a-a71e-ca3ee25fe057/rktools.exe  Kernrate is a sampling profiler.   It basically checks the location of the instruction pointer at regular intervals and stores the results in a hash table.   We can then get a breakdown of the %time spent in each module that is executing.    Even better you can zoon in to each module.  Zooming in shows utilization at a function level within the module and requires symbols to be present in a flat symbol directory on the machine being profiled.  I recommend downloading the symbol pack for this (http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx) or use symchk.exe (included in the debugging tools) to download the symbols.  We’ll talk more about symbols and symchk.exe in an upcoming post.

In a lot of cases kernrate data is only a starting point.   We will find some code that is running a lot longer in one case verses another and that in turn requires a follow up code review and multiple debugs to further isolate the problem.  This case however was different.  The following is output from beyond compare that shows a comparison between the module execution time in kernel.   The slow test run is on the right, and the fast test run is on the left.  Keeping in mind that I was looking for something different between safe mode and normal mode, I simply started by looking at the modules listed on the slow side (Right) there were not on the fast side (Left).   What was loaded during the normal run that was not a factor during the safe mode run.  Right away FILTERDRVXYZ just above the HAL jumped off the page.  (Some names were changed to protect the innocent. J)  I did a http://www.live.com search to find out what this driver was.  It was a file system filter driver for an antivirus program.

Read more: NTDEBUGGING BLOG

Posted via email from jasper22's posterous