Tuesday, August 7, 2007

New challenge for malware detection: Virtualization Based rootkit


Several days ago, Invisible Things Lab release a new open source project which named Blue Pill, the first battle ready hardware virtualization based rootkits. Even the code is not sophisticated in this version; I also believe its impact is profound significance. It is a starting gun for new trend of rootkits and malware, which will promote the battle field from OS in-house to VM level.


Just two years ago, virtualization can only be implemented by software emulation, base on interpreter, binary translation etc. We got some software solution, includes VMware Workstation, Microsoft Virtual PC and Virtual Box etc. But they are hard to ensure enough performance and compatibility.

But as the dual core and x64 become common, hardware virtualization solution become the mainstream, includes Xen, VMware ESX Server, Microsoft Longhorn etc. These solutions base on CPU level support includes Intel VT and AMD Pacifica (AMD-V), which introduce a new isolated level beside x86’s ring 0-3.

A mini OS kernel will run in hypervisor mode (VMM), which manage multi guest OS in normal mode (VM). VMM can monitor the status of VM, and take over some operation in VM, such as IO, privileged instruction etc.

This is the common workflow as the designer expected.

But on the other hand, the world is not perfect. Some malicious guys also can use those features to bypass traditional security solutions.

First, some white hat guys from University of Michigan and MSR release a paper SubVirt: implementing malware with virtual machines in 2006. They discuss the possibility for a new type of malware, named virtual-machine based rootkit (VMBR), which installs a virtual-machine monitor underneath an existing operating system and hoists the original operating system into a virtual machine.

Second, Joanna Rutkowska presented at the Black Hat Briefings 2006, for hers hardware virtualization based rootkits, named Blue Pill. This implementation base on CPU support, don’t need any binary translation, and very hard to detect from VM in-house.

Besides, Dino Dai Zovi from matasano also presented at Black Hat USA 2006 about hardware virtualization rootkits, with their implementation base on Xen 3.0.

With my experience, a new concept from idea to malware need one or two year. Now, one year has gone, source is available, and the hardware support will be more and more popular. Everything for this new type of malware is almost ready, only two actors are still missing, a hardware virtualization based rootkit from real world, and detection and clean security solution.

Tuesday, July 10, 2007

Wireshark Dissector Plugin for Look'n'Stop

From a developer viewpoint, Look'n'Stop is a great personal firewall. Even their design may not very clearly for the normal user, but if you have enough background knowledge, it can be a powerful analyzer for the security threats.


After a packet be allowed or blocked by rule, Look'n'Stop record it to log and provide a dialog for detail information. But these information not enough for me, so I decide to write a plugin to got more :)


Fortunately, they provide some plugin API for log display and rule editor. Through those interfaces, I can pop up my dissector dialog to display the protocol tree of packet.
To avoid reinvent the wheel, I choose Wireshark as background dissector. Because Wireshark, or more well know name - ethereal, is the best open source network protocol analyzer, and is the standard in many industries.

Even Wireshark has encapsulate all dissector in a library, its interface not clearly and stable, so I decide use its terminal-based edition - tshark, as the major dissector, because it can read packet from stdin, and dump the protocol tree to stdou as XML format.
So the major data and control flow includes:
  1. Look'n'Stop pass the packet data to our plugin through its API
  2. plugin fork a tshark process for dissect
  3. plugin dump the packet as libpcap format to tshark's stdin
  4. tshark dissect the packet to protocol tree and output the XML to stdout
  5. plugin fetch the XML output and parse it with expat
  6. plugin popup a tree-based dialog and render the protocol tree
  7. popup dialog provide more feature, for example, save the packet as libpcap format

Combine those steps, we got a new dissector plugin :)

You can select a field in protocol tree, and the corresponding data bytes will be highlight in the bottom editor. If you want to save the packet for more analyzer, just right-click the windows title, and choose "Save As" in system menu. it support save packet as libpcap, xml and text format.

But before this, you should download and install wireshark first, and configure the installation path in plugin, such as


If you input a valid path, plugin will fetch the version and copyright from tshark, and save it to registry to reuse in future.

For the china user, I integrate some location information of IP address. You should download the latest IP database from cz88.net, and configure the QQWry setting page, such as

After you choose "Use QQWry ...", a location information will be appended after some IP field in protocol tree.
If you have interest about this, please download the prebuild binary or compire it by youself.
That's all, if you have any advices or want to improve it by yourself, please contact me directly :)

Monday, May 28, 2007

Write a debugger in 5 minutes with PyDbgEng

The debug mechanism of PyDbgEng is same to other Win32 debugger, just create or attach to a debuggee process and call WaitForEvent to process the debug events, such as create process, load module etc.
#!/usr/bin/env python
import sys
from PyDbgEng import *

c = DebugClient()

c.CreateProcess("ftp.exe", createFlags=[CreateFlags.ATTACH_ONLY_THIS_PROCESS,
CreateFlags.NEW_CONSOLE])

while c.Control.WaitForEvent():
pass
The previous code is a simple debugger
  1. create a debug session with DebugClient()
  2. create the debuggee process with CreateProcess
  3. attach to the new process with CreateFlags.ATTACH_ONLY_THIS_PROCESS
  4. create a new console window for debuggee with CreateFlags.NEW_CONSOLE
To get more debug events, we must add some debug event callback
def onCreateProcess(args):
print "CreateProc: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

def onExitProcess(args):
print "ExitProcess %d" % args.ExitCode

def onCreateThread(args):
print "CreateThread %x %08x %08x" % (args.Handle, args.DataOffset, args.StartOffset)

def onExitThread(args):
print "ExitThread %d" % args.ExitCode

def onLoadModule(args):
print "ModLoad: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

c.EventCallbacks.CreateProcess = onCreateProcess
c.EventCallbacks.ExitProcess = onExitProcess
c.EventCallbacks.CreateThread = onCreateThread
c.EventCallbacks.ExitThread = onExitThread
c.EventCallbacks.LoadModule = onLoadModule
c.EventCallbacks.Attach()

c.CreateProcess(...)
Now, we will receive 5 kinds of debug events, which allow use show detail information, or do some action, such as add breakpoint, etc. After setting the callback and attach them to debugger, we can got events, like:
CreateProc: 01000000-01012000 ftp ftp.exe
ModLoad: 7c930000-7ca00000 ntdll ntdll.dll
ExitProcess 0
Other kinds of events are about the status or state changing for debug session, debuggee and symbol, we also can use callback to process them
def onSessionStatus(args):
print "SessionStatus: %s" % (str(args.Status))

def onChangeEngineState(args):
sys.stdout.write("EngineState: %s " % str(args.State))

if EngineState.EXECUTION_STATUS == args.State:
print ExecutionStatus.values[args.Argument & 0xf]
else:
print "%x" % args.Argument

c.EventCallbacks.SessionStatus = onSessionStatus
c.EventCallbacks.ChangeEngineState = onChangeEngineState
These events will allow you watch the order of state changing, like
EngineState: SYSTEMS 0
EngineState: EXECUTION_STATUS NO_CHANGE
EngineState: EXTENSIONS 0
SessionStatus: ACTIVE
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
CreateProc: 01000000-01012000 ftp ftp.exe
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
ModLoad: 7c930000-7ca00000 ntdll ntdll.dll
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS BREAK
EngineState: CURRENT_THREAD 0
To act as a complete debugger, we add and process breakpoint for some predefined function
def onLoadModule(args):
print "ModLoad: %08x-%08x %s\t%s" % (
args.BaseOffset, args.BaseOffset+args.ModuleSize,
args.ModuleName, args.ImageName)

if "WS2_32" == args.ModuleName:
bp = c.Control.AddBreakpoint(flags=[BreakpointFlag.ENABLED],
offset=c.Symbols.GetOffsetByName("WS2_32!socket"))

symbol = c.Symbols.GetNameByOffset(bp.Offset)
print "Add Breakpoint: %s %d @ %08x %s:%d" % (str(bp.Type[0]), bp.Id, bp.Offset, symbol[0], symbol[1])

def onBreakpoint(args):
bp = args.Breakpoint

symbol = c.Symbols.GetNameByOffset(bp.Offset)
print "Hit Breakpoint: %s %d @ %08x %s:%d" % (str(bp.Type[0]), bp.Id, bp.Offset, symbol[0], symbol[1])

return ExecutionStatus.BREAK

c.EventCallbacks.Breakpoint = onBreakpoint
After the WS2_32 module was loaded, we use DebugControl.AddBreakpoint method to create a new code break for WS2_32!socket, which will be called after ftp.exe started. So we add onBreakpoint callback function to show which breakpoint was hit.
ModLoad: 71b60000-71b77000 WS2_32 C:\WINDOWS\system32\WS2_32.dll
EngineState: BREAKPOINTS 0
EngineState: BREAKPOINTS 0
EngineState: BREAKPOINTS 0
Add Breakpoint: CODE 0 @ 71b6410c WS2_32!socket:0
...
EngineState: CURRENT_THREAD 0
Hit Breakpoint: CODE 0 @ 71b6410c WS2_32!socket:0
EngineState: EXECUTION_STATUS BREAK
EngineState: EXECUTION_STATUS GO_HANDLED
Change engine state to GO
EngineState: EXECUTION_STATUS GO_HANDLED
EngineState: EXECUTION_STATUS GO
Besides these expected events, we need another callback to process the exception.
def onException(args):
symbol = c.Symbols.GetNameByOffset(args.Address)
sys.stdout.write("Exception: %08x %08x %s:%d" % (args.Code, args.Address, symbol[0], symbol[1]))

if args.IsFirstChance:
print " first"
else:
print " second"

for frame in c.Control.GetStackFrames():
symbol = c.Symbols.GetNameByOffset(frame.InstructionOffset)
print " %04d %08x %s:%d" % (frame.FrameNumber, frame.InstructionOffset, symbol[0], symbol[1])

print c.Control.Breakpoints

c.EventCallbacks.Exception = onException
The callback will log the exception information, and dump the caller stack with DebugControl.GetStackFrames() method, like
Exception: 000006ba 7c80bee7 kernel32!RaiseException:83 first
0000 7c80bee7 kernel32!RaiseException:83
0001 77c31e37 RPCRT4!RpcpRaiseException:36
0002 77c32042 RPCRT4!NdrGetBuffer:70
0003 77cb30e4 RPCRT4!NdrClientCall2:407
0004 76e35039 DNSAPI!R_ResolverQuery:28
0005 76e34f59 DNSAPI!Query_PrivateExW:391
0006 76e3505f DNSAPI!DnsQuery_W:58
0007 71a83f8e MSWSOCK!SaBlob_Query:45
...
0023 010045c5 ftp!main:1665
0024 01006ee0 ftp!mainCRTStartup:303
0025 7c82f23b kernel32!BaseProcessStart:35
Finally, we add a try...except to protect the WaitForEvent method, because some situation will raise exception
try:
while c.Control.WaitForEvent():
c.Control.ExecutionStatus = ExecutionStatus.GO_HANDLED
print "Change engine state to %s" % c.Control.ExecutionStatus
except:
if ExecutionStatus.NO_DEBUGGEE != c.Control.ExecutionStatus:
print "Unexpected error:", sys.exc_info()[0]
raise
Now, its work, with less than one hundred code lines, and can be expand easy :)

Saturday, May 26, 2007

Access the kernel space with PyDbgEng

One year ago, I wrote a Chinese article <How to use kd/windbg engine to access the kernel space>, now I port the implementation to the PyDbgExt project, so we can directly access the kernel space in python.

>>> from PyDbgEng import *
>>> c = DebugClient()
>>> c.AttachKernel()
>>> c.Control.WaitForEvent()
True
>>> c.Symbols.LoadedModules
{'nt': (Module nt @ ffffffff80800000)}
>>> c.Symbols.GetSymbols("nt!KiServiceTable")
{'KiServiceTable': ((Symbol nt!KiServiceTable), 0)}
>>> offset = c.Symbols.GetSymbols("nt!KiServiceTable").popitem()[1][0].Offset
>>> c.Symbols.GetSymbols(c.DataSpaces.Virtual.ReadPointers(offset)[0])
{'NtAcceptConnectPort': ((Symbol nt!NtAcceptConnectPort), 18446744071571636794L)}
To access the kernel mode, we must attach engine to the local kernel with AttachKernel() first, and begin wait a debug event process with WaitForEvent(). For the kernel mode, this function will return immediately. After this, we can use almost all the functions to access the kernel space, such as modules or symbols.

Under the hood, to support this feature in a standalone python module, I use some dirty hack method, because the debug engine and driver disallow it used outside kd.exe or windbg.exe.
So, before call IDebugClient::AttachKernel method to enter the kernel mode, we must first hook four system functions:
static DWORD WINAPI HookedGetModuleFileNameW(HMODULE hModule, LPWSTR lpFilename, DWORD nSize)
{
DWORD dwSize = s_fnGetModuleFileNameW(hModule, lpFilename, nSize);

if (!hModule)
{
wchar_t *pch = wcsrchr(lpFilename, L'\\');
wcscpy_s(pch ? pch+1 : lpFilename, pch ? (nSize - (pch - lpFilename)) : nSize, L"kd.exe");
dwSize = wcslen(lpFilename);
}

return dwSize;
}
  • GetModuleFileNameW, debug engine use it to got the current executable filename, and check whether the filename end with "kd.exe" or "windbg.exe", but our filename maybe "python.exe"
  • FindResourceW, SizeofResource, LoadResource: debug engine use those functions to find and export the driver, which implement some internal works. I extract it from windbg.exe and embedded into PyDbgEng.dll.
As the previous description mentioned, we embedded the driver "kldbgdrv.sys" to resource, which as type 0x7777 and id 0x4444, like
/////////////////////////////////////////////////////////////////////////////
//
// RCDATA
//

30583 17476 "kldbgdrv.sys"
This tech can be used in any program which wants to access the kernel mode :)

Dump Windows Service Table in WinDbg

buri write a great article <Windows Service Table Dumper for WinDbg> show how to use the built-in script language in WinDbg to do a real job: dump the windows service table. But this script is short of readability, because the build-in script in WinDbg is very strange like its command design.
So, why we can't implement it more easy and readable, base on a friendly python script through PyDbgExt, my python extension for WinDbg :)

First we need define a script module, such as dumpServiceTable.py, which includes a function dumpServiceTable to dump that table, and import the dependence modules
from PyDbgEng import *
from struct import *

c = DebugClient.Current
s = c.Symbols
v = c.DataSpaces.Virtual
Next, we got the common base object, such as DebugClient.Current which is the current debug session in windbg; Symbols and DataSpaces.Virtual will support us query the debug symbol and read/write the virtual address space.
def getSymbol(name):
return s.GetSymbols(name).popitem()[1][0]

def getSymbol(offset):
return s.GetSymbols(offset).popitem()[1][0]

def readDWORD(offset):
return unpack_from("L", v.Read(offset, 4))[0]

To make the code more readable, we define some utility functions: getSymbol can get the symbol object with its name or offset; readDWORD read unsigned long from the offset. According to the result type of VirtualDataSpace.Read function is a buffer object, we need use unpack_from function to decode the buffer.
def dumpServiceTable():
KiServiceTable = getSymbol("nt!KiServiceTable")
KiServiceLimit = getSymbol("nt!KiServiceLimit")

idx = 0

for addr in v.ReadPointers(KiServiceTable.Offset, readDWORD(KiServiceLimit.Offset)):
try:
symbol = getSymbol(addr)

symbolName = "%s!%s" % (symbol.Module.ModuleName, symbol.Name)
except:
symbolName = ""

print "%03d %08x %s" % (idx, addr & 0xFFFFFFFF, symbolName)

idx = idx + 1
The last part of code read and dump the service table:
  1. get the symbol object of nt!KiServiceTable and nt!KiServiceLimit
  2. read a group of pointers from the begin of table
  3. try to get the symbol object for every entry in table
  4. if the symbol exists, dump it's address, module and name
  5. if the symbol nonexists, just show warning. we can provide more information about this in future
Finally, we load the script to windbg and execute it :)
lkd> .extpath+ D:\Study\Win32\PyDbgExt\Binary\debug
Extension search path is: ...;D:\Study\Win32\PyDbgExt\Binary\debug
lkd> .load PyDbgExt
lkd> .chain
Extension DLL search Path:
...
Extension DLL chain:
PyDbgExt: API 1.0.0, built Sat May 26 02:17:49 2007
[path: D:\Study\Win32\PyDbgExt\Binary\debug\PyDbgExt.dll]
dbghelp: image 6.7.0005.0, API 6.0.6, built Fri Mar 30 02:08:09 2007
[path: D:\MS\Debugging Tools for Windows\dbghelp.dll]
...
lkd> !import dumpServiceTable
Import succeeded.
lkd> !eval dumpServiceTable.dumpServiceTable()
000 8092023a nt!NtAcceptConnectPort
001 8096b71e nt!NtAccessCheck
002 8096f9be nt!NtAccessCheckAndAuditAlarm
...
032 808b9810 nt!NtCompressKey
033 f4bed0d2
034 8088d0c8 nt!NtContinue
...
If there some wrong in script, just edit it and reload it with python build-in function
lkd> !eval reload(dumpServiceTable)
Enjoy it :)

Wednesday, May 16, 2007

An alternative open source virtualization solution: VirtualBox

To accelerate the product development and test, we usual choose some kinds of virtualizer to simulate environment, such as VMware workstation, Virtual PC etc. With these kinds of VM, we can build the environment one time, and reuse it again and again, just take snapshot and rollback it.

But if we have some advance requirement, such as control the VM to implement some auto-test script, these commercial products maybe restrict your idea. Even they provided some SDK, the ability is very limited.

Fortunately we have some alternative virtualization solutions, such as VirtualBox. It is a general-purpose full virtualizer for x86 hardware. Targeted at server, desktop and embedded use, it is now the only professional-quality virtualization solution that is also Open Source Software.

The usage and GUI of VirtualBox is very like VMware, so we can switch to it easy. The VirtualBox can be downloaded and installed on Windows, Linux and OS X hosts; And the guest OS can support mostly common platform, such as Windows, Linux, xxxBSD, etc. A complete guest OS list can be found at Guest OSes. It also has an open source edition under the GPL license, but it isn't including some advance features, such as RDP support.

Compare with VMware or other VM vender, VirtualBox has a great openness. It's virtual machine descriptions in XML, we can easy fetch information from it; It's VM directly support RDP (Remote Desktop Protocol) protocol, so we can use remote desktop connection software to control it, the latest version of VMware - Workstation 6.0 provide similar feature which allow remotely access the console of a VM from a VNC client.

Furthermore, the most important difference is VirtualBox provided a very powerful management interface. For end-users, they can use VBoxManage in command line to control most common functions of VM; for developers, they can get some depth control with a set of COM/XPCOM control interfaces, such as control the keyboard/mouse action etc.

From the implementation viewpoint, the architecture of VirtualBox much likes a client/server solution. The controller client, such as QT-based GUI or console mode VBoxManage, connects to the backend VM implementation (VBoxVMM, etc.), through COM/XPCOM interface (VBoxSVC, VBoxC, etc).

In addition, a kernel driver (VBoxDrv.sys on Windows) or module (vboxdrv on Linux) will allocating physical memory for the VM, control the context switch and other dirty work.

But not like Xen and VMware, VirtualBox is not directly depending on some kinds of hardware virtualization technology, such as VT or AMD-V. VirtualBox run the guest OS kernel at ring 1, through some advanced code scanning, analysis and patching techniques, such as "Patch Manager" (PATM) and "Code Scanning and Analysis Manager" (CSAM).

Before executing ring 0 code, we scan it recursively to discover problematic instructions. We then perform in-situ patching, i.e. we replace the instruction with a jump to hypervisor memory where an integrated code generator has placed a more suitable implementation. In reality, this is a very complex task as there are lots of odd situations to be discovered and handled correctly. So, with its current complexity, one could argue that PATM is an advanced in-situ recompiler.

In addition, every time a fault occurs, we analyze the fault's cause to determine if it is possible to patch the offending code to prevent it from causing more expensive faults in the future. This turns out to work very well, and we can reduce the faults caused by our virtualization to a rate that performs much better than a typical recompiler, or even VT-x technology, for that matter.

This method seems specially but effective, at least on my machine, VirtualBox run some OS correctly and very fast.

If you have more detail question, please refer to their Developer FAQ or directly read the source code. :)