Monday, May 28, 2007

Write a debugger in 5 minutes with PyDbgEng

The debug mechanism of PyDbgEng is same to other Win32 debugger, just create or attach to a debuggee process and call WaitForEvent to process the debug events, such as create process, load module etc.
#!/usr/bin/env python
import sys
from PyDbgEng import *

c = DebugClient()

c.CreateProcess("ftp.exe", createFlags=[CreateFlags.ATTACH_ONLY_THIS_PROCESS,
CreateFlags.NEW_CONSOLE])

while c.Control.WaitForEvent():
pass
The previous code is a simple debugger
  1. create a debug session with DebugClient()
  2. create the debuggee process with CreateProcess
  3. attach to the new process with CreateFlags.ATTACH_ONLY_THIS_PROCESS
  4. create a new console window for debuggee with CreateFlags.NEW_CONSOLE
To get more debug events, we must add some debug event callback
def onCreateProcess(args):
print "CreateProc: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

def onExitProcess(args):
print "ExitProcess %d" % args.ExitCode

def onCreateThread(args):
print "CreateThread %x %08x %08x" % (args.Handle, args.DataOffset, args.StartOffset)

def onExitThread(args):
print "ExitThread %d" % args.ExitCode

def onLoadModule(args):
print "ModLoad: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

c.EventCallbacks.CreateProcess = onCreateProcess
c.EventCallbacks.ExitProcess = onExitProcess
c.EventCallbacks.CreateThread = onCreateThread
c.EventCallbacks.ExitThread = onExitThread
c.EventCallbacks.LoadModule = onLoadModule
c.EventCallbacks.Attach()

c.CreateProcess(...)
Now, we will receive 5 kinds of debug events, which allow use show detail information, or do some action, such as add breakpoint, etc. After setting the callback and attach them to debugger, we can got events, like:
CreateProc: 01000000-01012000 ftp ftp.exe
ModLoad: 7c930000-7ca00000 ntdll ntdll.dll
ExitProcess 0
Other kinds of events are about the status or state changing for debug session, debuggee and symbol, we also can use callback to process them
def onSessionStatus(args):
print "SessionStatus: %s" % (str(args.Status))

def onChangeEngineState(args):
sys.stdout.write("EngineState: %s " % str(args.State))

if EngineState.EXECUTION_STATUS == args.State:
print ExecutionStatus.values[args.Argument & 0xf]
else:
print "%x" % args.Argument

c.EventCallbacks.SessionStatus = onSessionStatus
c.EventCallbacks.ChangeEngineState = onChangeEngineState
These events will allow you watch the order of state changing, like
EngineState: SYSTEMS 0
EngineState: EXECUTION_STATUS NO_CHANGE
EngineState: EXTENSIONS 0
SessionStatus: ACTIVE
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
CreateProc: 01000000-01012000 ftp ftp.exe
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
ModLoad: 7c930000-7ca00000 ntdll ntdll.dll
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
To act as a complete debugger, we add and process breakpoint for some predefined function
def onLoadModule(args):
print "ModLoad: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

if "WS2_32" == args.ModuleName:
bp = c.Control.AddBreakpoint(flags=[BreakpointFlag.ENABLED],
offset=c.Symbols.GetOffsetByName("WS2_32!socket"))

symbol = c.Symbols.GetNameByOffset(bp.Offset)
print "Add Breakpoint: %s %d @ %08x %s:%d" % (str(bp.Type[0]), bp.Id, bp.Offset, symbol[0], symbol[1])

def onBreakpoint(args):
bp = args.Breakpoint

symbol = c.Symbols.GetNameByOffset(bp.Offset)
print "Hit Breakpoint: %s %d @ %08x %s:%d" % (str(bp.Type[0]), bp.Id, bp.Offset, symbol[0], symbol[1])

return ExecutionStatus.BREAK

c.EventCallbacks.Breakpoint = onBreakpoint
After the WS2_32 module was loaded, we use DebugControl.AddBreakpoint method to create a new code break for WS2_32!socket, which will be called after ftp.exe started. So we add onBreakpoint callback function to show which breakpoint was hit.
ModLoad: 71b60000-71b77000 WS2_32 C:\WINDOWS\system32\WS2_32.dll
EngineState: BREAKPOINTS 0
EngineState: BREAKPOINTS 0
EngineState: BREAKPOINTS 0
Add Breakpoint: CODE 0 @ 71b6410c WS2_32!socket:0
...
EngineState: CURRENT_THREAD 0
Hit Breakpoint: CODE 0 @ 71b6410c WS2_32!socket:0
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS GO_HANDLED
Change engine state to GO
EngineState: EXECUTION_STATUS GO_HANDLED
EngineState: EXECUTION_STATUS GO
Besides these expected events, we need another callback to process the exception.
def onException(args):
symbol = c.Symbols.GetNameByOffset(args.Address)
sys.stdout.write("Exception: %08x %08x %s:%d" % (args.Code, args.Address, symbol[0], symbol[1]))

if args.IsFirstChance:
print " first"
else:
print " second"

for frame in c.Control.GetStackFrames():
symbol = c.Symbols.GetNameByOffset(frame.InstructionOffset)
print " %04d %08x %s:%d" % (frame.FrameNumber, frame.InstructionOffset, symbol[0], symbol[1])

print c.Control.Breakpoints

c.EventCallbacks.Exception = onException
The callback will log the exception information, and dump the caller stack with DebugControl.GetStackFrames() method, like
Exception: 000006ba 7c80bee7 kernel32!RaiseException:83 first
0000 7c80bee7 kernel32!RaiseException:83
0001 77c31e37 RPCRT4!RpcpRaiseException:36
0002 77c32042 RPCRT4!NdrGetBuffer:70
0003 77cb30e4 RPCRT4!NdrClientCall2:407
0004 76e35039 DNSAPI!R_ResolverQuery:28
0005 76e34f59 DNSAPI!Query_PrivateExW:391
0006 76e3505f DNSAPI!DnsQuery_W:58
0007 71a83f8e MSWSOCK!SaBlob_Query:45
...
0023 010045c5 ftp!main:1665
0024 01006ee0 ftp!mainCRTStartup:303
0025 7c82f23b kernel32!BaseProcessStart:35
Finally, we add a try...except to protect the WaitForEvent method, because some situation will raise exception
try:
while c.Control.WaitForEvent():
c.Control.ExecutionStatus = ExecutionStatus.GO_HANDLED
print "Change engine state to %s" % c.Control.ExecutionStatus
except:
if ExecutionStatus.NO_DEBUGGEE != c.Control.ExecutionStatus:
print "Unexpected error:", sys.exc_info()[0]
raise
Now, its work, with less than one hundred code lines, and can be expand easy :)

Saturday, May 26, 2007

Access the kernel space with PyDbgEng

One year ago, I wrote a Chinese article <How to use kd/windbg engine to access the kernel space>, now I port the implementation to the PyDbgExt project, so we can directly access the kernel space in python.

>>> from PyDbgEng import *
>>> c = DebugClient()
>>> c.AttachKernel()
>>> c.Control.WaitForEvent()
True
>>> c.Symbols.LoadedModules
{'nt': (Module nt @ ffffffff80800000)}
>>> c.Symbols.GetSymbols("nt!KiServiceTable")
{'KiServiceTable': ((Symbol nt!KiServiceTable), 0)}
>>> offset = c.Symbols.GetSymbols("nt!KiServiceTable").popitem()[1][0].Offset
>>> c.Symbols.GetSymbols(c.DataSpaces.Virtual.ReadPointers(offset)[0])
{'NtAcceptConnectPort': ((Symbol nt!NtAcceptConnectPort), 18446744071571636794L)}
To access the kernel mode, we must attach engine to the local kernel with AttachKernel() first, and begin wait a debug event process with WaitForEvent(). For the kernel mode, this function will return immediately. After this, we can use almost all the functions to access the kernel space, such as modules or symbols.

Under the hood, to support this feature in a standalone python module, I use some dirty hack method, because the debug engine and driver disallow it used outside kd.exe or windbg.exe.
So, before call IDebugClient::AttachKernel method to enter the kernel mode, we must first hook four system functions:
static DWORD WINAPI HookedGetModuleFileNameW(HMODULE hModule, LPWSTR lpFilename, DWORD nSize)
{
DWORD dwSize = s_fnGetModuleFileNameW(hModule, lpFilename, nSize);

if (!hModule)
{
wchar_t *pch = wcsrchr(lpFilename, L'\\');
wcscpy_s(pch ? pch+1 : lpFilename, pch ? (nSize - (pch - lpFilename)) : nSize, L"kd.exe");
dwSize = wcslen(lpFilename);
}

return dwSize;
}
  • GetModuleFileNameW, debug engine use it to got the current executable filename, and check whether the filename end with "kd.exe" or "windbg.exe", but our filename maybe "python.exe"
  • FindResourceW, SizeofResource, LoadResource: debug engine use those functions to find and export the driver, which implement some internal works. I extract it from windbg.exe and embedded into PyDbgEng.dll.
As the previous description mentioned, we embedded the driver "kldbgdrv.sys" to resource, which as type 0x7777 and id 0x4444, like
/////////////////////////////////////////////////////////////////////////////
//
// RCDATA
//

30583 17476 "kldbgdrv.sys"
This tech can be used in any program which wants to access the kernel mode :)

Dump Windows Service Table in WinDbg

buri write a great article <Windows Service Table Dumper for WinDbg> show how to use the built-in script language in WinDbg to do a real job: dump the windows service table. But this script is short of readability, because the build-in script in WinDbg is very strange like its command design.
So, why we can't implement it more easy and readable, base on a friendly python script through PyDbgExt, my python extension for WinDbg :)

First we need define a script module, such as dumpServiceTable.py, which includes a function dumpServiceTable to dump that table, and import the dependence modules
from PyDbgEng import *
from struct import *

c = DebugClient.Current
s = c.Symbols
v = c.DataSpaces.Virtual
Next, we got the common base object, such as DebugClient.Current which is the current debug session in windbg; Symbols and DataSpaces.Virtual will support us query the debug symbol and read/write the virtual address space.
def getSymbol(name):
return s.GetSymbols(name).popitem()[1][0]

def getSymbol(offset):
return s.GetSymbols(offset).popitem()[1][0]

def readDWORD(offset):
return unpack_from("L", v.Read(offset, 4))[0]

To make the code more readable, we define some utility functions: getSymbol can get the symbol object with its name or offset; readDWORD read unsigned long from the offset. According to the result type of VirtualDataSpace.Read function is a buffer object, we need use unpack_from function to decode the buffer.
def dumpServiceTable():
KiServiceTable = getSymbol("nt!KiServiceTable")
KiServiceLimit = getSymbol("nt!KiServiceLimit")

idx = 0

for addr in v.ReadPointers(KiServiceTable.Offset, readDWORD(KiServiceLimit.Offset)):
try:
symbol = getSymbol(addr)

symbolName = "%s!%s" % (symbol.Module.ModuleName, symbol.Name)
except:
symbolName = ""

print "%03d %08x %s" % (idx, addr & 0xFFFFFFFF, symbolName)

idx = idx + 1
The last part of code read and dump the service table:
  1. get the symbol object of nt!KiServiceTable and nt!KiServiceLimit
  2. read a group of pointers from the begin of table
  3. try to get the symbol object for every entry in table
  4. if the symbol exists, dump it's address, module and name
  5. if the symbol nonexists, just show warning. we can provide more information about this in future
Finally, we load the script to windbg and execute it :)
lkd> .extpath+ D:\Study\Win32\PyDbgExt\Binary\debug
Extension search path is: ...;D:\Study\Win32\PyDbgExt\Binary\debug
lkd> .load PyDbgExt
lkd> .chain
Extension DLL search Path:
...
Extension DLL chain:
PyDbgExt: API 1.0.0, built Sat May 26 02:17:49 2007
[path: D:\Study\Win32\PyDbgExt\Binary\debug\PyDbgExt.dll]
dbghelp: image 6.7.0005.0, API 6.0.6, built Fri Mar 30 02:08:09 2007
[path: D:\MS\Debugging Tools for Windows\dbghelp.dll]
...
lkd> !import dumpServiceTable
Import succeeded.
lkd> !eval dumpServiceTable.dumpServiceTable()
000 8092023a nt!NtAcceptConnectPort
001 8096b71e nt!NtAccessCheck
002 8096f9be nt!NtAccessCheckAndAuditAlarm
...
032 808b9810 nt!NtCompressKey
033 f4bed0d2
034 8088d0c8 nt!NtContinue
...
If there some wrong in script, just edit it and reload it with python build-in function
lkd> !eval reload(dumpServiceTable)
Enjoy it :)

Wednesday, May 16, 2007

An alternative open source virtualization solution: VirtualBox

To accelerate the product development and test, we usual choose some kinds of virtualizer to simulate environment, such as VMware workstation, Virtual PC etc. With these kinds of VM, we can build the environment one time, and reuse it again and again, just take snapshot and rollback it.

But if we have some advance requirement, such as control the VM to implement some auto-test script, these commercial products maybe restrict your idea. Even they provided some SDK, the ability is very limited.

Fortunately we have some alternative virtualization solutions, such as VirtualBox. It is a general-purpose full virtualizer for x86 hardware. Targeted at server, desktop and embedded use, it is now the only professional-quality virtualization solution that is also Open Source Software.

The usage and GUI of VirtualBox is very like VMware, so we can switch to it easy. The VirtualBox can be downloaded and installed on Windows, Linux and OS X hosts; And the guest OS can support mostly common platform, such as Windows, Linux, xxxBSD, etc. A complete guest OS list can be found at Guest OSes. It also has an open source edition under the GPL license, but it isn't including some advance features, such as RDP support.

Compare with VMware or other VM vender, VirtualBox has a great openness. It's virtual machine descriptions in XML, we can easy fetch information from it; It's VM directly support RDP (Remote Desktop Protocol) protocol, so we can use remote desktop connection software to control it, the latest version of VMware - Workstation 6.0 provide similar feature which allow remotely access the console of a VM from a VNC client.

Furthermore, the most important difference is VirtualBox provided a very powerful management interface. For end-users, they can use VBoxManage in command line to control most common functions of VM; for developers, they can get some depth control with a set of COM/XPCOM control interfaces, such as control the keyboard/mouse action etc.

From the implementation viewpoint, the architecture of VirtualBox much likes a client/server solution. The controller client, such as QT-based GUI or console mode VBoxManage, connects to the backend VM implementation (VBoxVMM, etc.), through COM/XPCOM interface (VBoxSVC, VBoxC, etc).

In addition, a kernel driver (VBoxDrv.sys on Windows) or module (vboxdrv on Linux) will allocating physical memory for the VM, control the context switch and other dirty work.

But not like Xen and VMware, VirtualBox is not directly depending on some kinds of hardware virtualization technology, such as VT or AMD-V. VirtualBox run the guest OS kernel at ring 1, through some advanced code scanning, analysis and patching techniques, such as "Patch Manager" (PATM) and "Code Scanning and Analysis Manager" (CSAM).

Before executing ring 0 code, we scan it recursively to discover problematic instructions. We then perform in-situ patching, i.e. we replace the instruction with a jump to hypervisor memory where an integrated code generator has placed a more suitable implementation. In reality, this is a very complex task as there are lots of odd situations to be discovered and handled correctly. So, with its current complexity, one could argue that PATM is an advanced in-situ recompiler.

In addition, every time a fault occurs, we analyze the fault's cause to determine if it is possible to patch the offending code to prevent it from causing more expensive faults in the future. This turns out to work very well, and we can reduce the faults caused by our virtualization to a rate that performs much better than a typical recompiler, or even VT-x technology, for that matter.

This method seems specially but effective, at least on my machine, VirtualBox run some OS correctly and very fast.

If you have more detail question, please refer to their Developer FAQ or directly read the source code. :)