As I said in a previous post. It is of utter importance to store the symbol files for a released software. Otherwise you can forget any serious memory crash dump analysis.
Real world case
At work, I was assigned to analyze a hanging web service. Curiously the test case worked on developer machines, and hanged on the test machine. I was up for the challenge. I asked them to send me a memory dump of the hanging process.
I loaded the crash dump into Visual Studio, in order to have a look at the call stack. I tried to determine the date of the build, in order to copy the corresponding .pdb files.
The law of murphy, I couldn’t find the corresponding symbol files. The symbol archive for this build hadn’t been properly stored.
So the callstack that I got looked something like this.
At least to me, that is pretty unreadable.
A quick observation tells us that the process isn’t hanging in my code, but in user32.dll. That dll is part of the Windows system libraries, and implements functions for manipulation of the graphical user interface.
Since the process was a web service running as a Windows service, already at this point, I realized that the state was illegal, since a service runs without a user interface.
Leveraging the Microsoft symbol server
Wouldn’t it be great to have the .pdb files for the windows system libraries?
They are available!!!
The answer is the Microsoft symbol server.
After loading all the symbols, my callstack looked like this.
Much much more readable. We see that this process is displaying a dialog box, and the last call NTUserWaitMessage tells us that is waiting for some input, probably for us to click on a Ok or Cancel button. But why, and what does the dialog box say?
We have no GUI, so the dialog box is not visible, but since all parameters are passed via the stack. I loaded up the memory dump in Windbg instead and inspected the parameters on the stack. I eventually found the string that was passed to the dialog box.
It said that there wasn’t any any default printer installed, and asked if I wanted to install one. Since it didn’t get any input, it remained with the dialog box open, leaving the process in a hanging state.
That explained the error, and why it was working on the developer machines. Who these days, doesn’t have a printer installed? This was a Windows service, and was running under the system account on a Windows Server 2008. In contrast to Win XP, where all users have the a file printer XPS installed by default, Windows server 2008 don’t install default printers, and each user much configure its own printers.
It proved to be a challenge to install a printer for the system user, since one cannot log on to it. I found some nasty registry patches, and some instructions for applying group policies. We worked around it by running the service under a different user, but documented the other solutions.
I solved this case, without needing the .pdb files for my own binary. It was sufficient to have the symbols for the standard windows libraries. In its extension, it is possible to do a certain degree of analysis and debugging of any binary, not exclusively your own.
Further on, even though I would have had the symbol files for my binary, it wouldn’t be enough to see the dialog message. But I would have seen the last function executed from my code, which would have given me a hint that it was about printing.
I recommend everybody to configure the debugger to automatically download symbols from the Microsoft symbol server. It will prove valuable.