Debug Xen Hosted Windows Kernel Over Network

Read the original at my company’s blog.

Blue screens are not a rare commodity when working with virtualization. Most of the times, full crash dumps do the trick, but sometimes live kernel debugging is required. Hard disk related crashes that prevent memory dumping is a good example where it is required, but there are times where it’s just easier to follow the entire crash flow instead of just witnessing the final state.

Type 2 (hosted) virtualization usually comes with an easy solution. But type 1 (bare metal) virtualization, like Xen, complicates matters. Debugging must be offloaded to a remote Windows machine. The common solution, it seems, is to tunnel the hosted machine’s serial connection over TCP to another Windows machine where WinDBG is running, waiting anxiously for a bug check. There are many websites describing this setup in various component combinations. I have gathered here all the tricks I could find plus some more of my own to streamline the process and get rid of commercial software.

Lets dive into the nitty gritty little details, shall we?

Hosted Windows

Kernel debugging requires some boot parameters. Windows XP includes a utility called bootcfg.exe that makes this easy.

bootcfg /copy /id 1 /d "kernel debug"
bootcfg /raw "/DEBUG /DEBUGPORT=COM1" /id 2 /a
bootcfg /raw "/BAUDRATE=115200" /id 2 /a
bootcfg /copy /id 2 /d "kernel debug w/ break"
bootcfg /raw "/BREAK" /id 3 /a

This assumes you have only one operation system configured in Windows boot loader. If the boot loader menu shows up when Windows boots, you might need to add the flags on your own to C:\boot.ini.

Xen Host

Windows will now try to access the serial port in search of a debugger. Xen’s domain configuration file can be used to forward the serial port over TCP. Locate your domain configuration file and add the following line. The configuration files are usually located under /etc/xen.

serial='tcp::4444,server,nowait'

Debugger Machine

The server side is set and it’s time to move on to the client. As previously mentioned, WinDBG doesn’t care for TCP. Instead of the usual TCP to RS-232 solution, named pipes are used here. I wrote a little application called tcp2pipe (download available on the bottom) which simply pumps data between a TCP socket and a named pipe. It takes three parameters – IP, port and named pipe path. The IP address is the address of the Xen host and the port is 4444. For named pipe path, use \\.\pipe\XYZ, where XYZ can be anything.

tcp2pipe.exe 192.168.0.5 4444 \\.\pipe\XYZ

All that is left now is to fire up WinDBG and connect it to \\.\pipe\XYZ. This can be done from the menus, or from command line.

windbg -k com:pipe,port=\\.\pipe\XYZ

To make this even simpler, you can use kdbg.bat and pass it just the IP. It assumes WinDBG.exe is installed in c:\progra~1\debugg~1. If that’s not the case, you’ll have to modify it and point it to the right path.

tcp2pipe

Source code is included in zip file under public domain.

Download tcp2pipe.zip (mirror).

Happy debugging!

SourceForge tracker and SVN integration

TortoiseSVN and some other Subversion clients have a cool feature for integration with issue trackers. By setting a few properties for a project, you can have automatically generated links to tracked issues directly from SVN commit messages. TortoiseSVN also supports a special input method for issue numbers in commit boxes for easier integration.

The basic property that must be set is bugtraq:url which lets SVN know how to form a URL from an issue number. This is a bit difficult with SourceForge, where there are in fact at least three trackers and each with its own URL scheme including the tracker identifier in the URL. Toying with the URL and removing both group and tracker identifiers results in an error. After playing around some more, I found the following URL that works like a charm for every group and any tracker so there isn’t even a need to figure out the group identifier.

http://sourceforge.net/support/tracker.php?aid=%BUGID%

It took me a while to figure out and I couldn’t find any mention of it online, so I thought I’d post it here to help others.

For more information on all of the related properties see TortoiseSVN’s page.

Pragmatic variant

As mentioned in my previous post, I have been working on incorporating some more features into WinVer.nsh. Every little change in this header file requires testing on all possible versions and configurations of Windows. Being the Poor Open Source DeveloperTM that I am, I do not have sufficient resources to assemble a full-blown testing farm with every possible version of Windows on every possible hardware configuration. Instead, I have to settle for a bunch of virtual machines I have collected over the years. It is pretty decent, but has no standards and doesn’t cover every possible version. Still, it does its job well and has proven itself very effective.

Obviously, be it a farm or a mere collection of virtual machines, testing on so many different configurations carries with it a hefty fine. Testing a single line change could waste almost an hour. Imagine the time it would take to test, fix, retest, fix and retest again a complete rewrite of WinVer.nsh. As fascinating as that empirical scientific experiment would have been, I was reluctant to find out. Laziness, in this case, proved to be a very practical solution.

WinVer.nsh tests do not really need the entire operation system and its behavior as it relies on nothing but 4 parameters. All it requires is the return values of GetVersionEx for OSVERSIONINFO and OSVERSIONINFOEX. For nothing more than 312 bytes, I have to wait until Windows Vista decides it wants to execute my test, Windows NT4 gracefuly connects to my network, Windows ME wakes up on the right side of the bed and doesn’t crash, Windows Server 2008 installs again after its license has expired and Windows 95…. Actually, that one works pretty well. So why wait?

Instead, I’ve created a little harvester that collects those 312 bytes, ran it on all of my machines and mustered the results into one huge script that tests every aspect of WinVer.nsh using every possible configuration of Windows in a few seconds. It required adding a hooking option to WinVer.nsh, but with the new !ifmacrondef, that was easy enough.

Currently, the script tests:

  • Windows 95 OSR B
  • Windows 98
  • Windows ME
  • Windows NT4 (SP1, SP6)
  • Windows 2000 (SP0, SP4)
  • Windows XP (SP2, SP3)
  • Windows XP x64 (SP1)
  • Windows Vista (SP0)
  • Windows Server 2008 (SP1)

If you have access to a configuration not listed here, please run the harvester and send me the results. More specifically, I could really use Windows 2003 and Windows Vista SP1. My Windows Vista installation simply refuses the upgrade to SP1. Again.

The test script also includes a hexdump of those 312 bytes for every configuration so anyone performing similar tests for another reason doesn’t have to parse the NSIS syntax. Feel free to use it for your testing.

Dominical update

9 out of 10 open-source experts advocate frequent releases. We, the simple people, don’t know better and should listen to the experts. Sadly, we simpletons still don’t know how to read and so the fine print eludes us. While we all may be good and obedient developers, the users don’t care for our frequent releases squashing our colossus bugs and featuring our shiny new toys. As frequent our releases are as frequent the reports of bugs long ago fixed and features that shined and sparkled at ancient times but are now filled with rust.

Ghost versions of the past haunt us daily while users refuse to upgrade. Our innovative forefathers, suffering immensely from this plague, had uncovered the great potential of automatic updates. No longer is the user able to flee his ordained destiny. Fate shall pop-up and fulfill itself even with the absence of user interaction.

But even this sparsely applied method carries its own set of fine prints. Boiler plate implementation includes a web server containing the latest version number or even a server-side script that ever so nicely checks for the user whether his version is expectedly old. As with everything else, here too success brings failure. As faithful users gather their masses around our monthly-polished releases, the web server begins to break down. Most web servers, especially those that poor open-source developers can afford, do not offer load balancing and will easily succumb to the sheer amount of bandwidth generated by thousands of users performing even the simplest of GET requests.

Enter DNS. The Domain Name System is a distributed and globally cached system that basically maps domain names such as nsis.latest-version.org into numbers such as 2.36.0.0. And it gets even better — foreign sources report there are free DNS servers out there, waiting to be used. Services such as dyndns.org offer a simple HTTP based API that sets new IP to a free domain name. Creating a new version notification service is as simple as creating a new free domain, updating it every time a new version is released, calling inet_addr when the client-side loads and comparing the result to the current version.

This free and simple solution provides many advantages over conventional HTTP based version check.

  • Automatic load balancing with servers all over the world.
  • Simple code with no need for complex HTTP libraries.
  • No need for relatively heavy HTTP operations for both client and server.
  • HTTP proxies do not get in the way.
  • Firewalls and the entire security fiasco usually overlook DNS.

And as always, there are disadvantages.

  • Updates take time to propagate.
  • Only 3 bytes of information.

Make sure you set the first byte to 127 to make sure the IP associated with your update domain is invalid. This way, whoever is at 2.36.0.0 won’t get any unwelcome traffic.

I am probably not the first to think of this, but it is a cool idea nonetheless. I’m so going to implement this for the next version of NSIS! 🙂

Atomic codes

I had some fun today trying to figure out why Banner likes to hang around with .NET so much so it wouldn’t even leave. I found out that while being destroyed, something tries to send messages to the main dialog. But the main dialog is busy with destroying the banner. I added exactly two iterations of the famous win32 message loop and everything started working. I still don’t know why those messages are sent or why it’s so important they’ll be answered before the banner is destroyed or even why it happens just with the .NET installer. And don’t even ask about different synchronization methods that make it tick. So far, I’ve found only smoke signals and the fire extinguisher won’t last much longer.

Of all the signals, I liked the message loop the most. It actually points to something I’ve done wrong. I’ve starved the main dialog’s thread while creating a modeless dialog as its child. That’s why I dug in further into those two iterations of the loop and those two messages that it processes. It turns out both of them had the same identifier – 0xc0c3. Now that’s no regular WM_ message… That’s a message registered with RegisterWindowMessage. But which message is it? That’s where the fun starts. There’s no GetRegisteredWindowMessage API available and nothing on the topic comes out on Google.

So with no leads to follow I started digging. Normally, to give a certain string a specific value in Windows, an atom is created. And indeed, 0xc0c3 is in the range of named atoms. To make things even simpler, in WINE, RegisterWindowMessage simply calls GlobalAddAtom, casts ATOM to UINT and returns. Great, then GetAtomName or GlobalGetAtomName should do the trick. Only reality isn’t as bright as WINE would like us to think. It turns out RegisterWindowMessage uses a different atom table for its messages. But which atom table and how can you even specify a table with GetAtomName?

To specify a table, a low-level access to RtlLookupAtomInAtomTable is required. But that function is deep inside ntoskrnl.exe. So, up one level and you get NtUserGetAtomName which uses the same atom table as NtUserAddAtom which is the function RegisterWindowMessage calls. But that’s inside win32k.sys… Luckily, user32.dll already handles that. It has a stub that calls NtUserGetAtomName at 0x7E41FA8E. Some playing around with the second parameter which turns out to be UNICODE_STRING and the atomic table is in hands’ reach.

Engines off, coding fingers down, digging complete and the message name is MSUIM.Msg.Private. That too gets little to none results on Google, but who cares… Debugging is fun 🙂

For any of you who’d ever want to convert a registered message into a readable name, here’s the NSIS code. Replace 0xc0c3 with the message identifier and 0x7E41FA8E with user32!NtUserGetAtomName and you’re good to go.

# the atom
StrCpy $2 0xc0c3
;System::Call user32::RegisterWindowMessage(t'test_message')i.r2
# create UNICODE_STRING
System::Alloc 1008
Pop $R0
StrCpy $R1 0
StrCpy $R2 1000
IntOp $R3 $R0 + 8
System::Call *$R0(&i2R1,&i2R2,iR3)
# call NtUserGetAtomName
System::Call ::0x7E41FA8E(ir2,iR0)i.r1?e
# parse UNICODE_STRING
System::Call *$R0(&i2.r4,&i2.r3,w.r0)
# print details
DetailPrint "user atom's name is $0"
DetailPrint "length is $4 (???)"
DetailPrint "NtUserGetAtomName returned $1"
Pop $1
DetailPrint "GetLastError() = $1"
# done
System::Free $R0