RCE Endeavors 😅

December 2, 2014

Writing a Primitive Debugger: Part 2 (Breakpoints/Stepping)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 9:06 AM

For a debugger to be of any practical use, it needs the ability to pause, inspect, and resume a target process. This post will cover these topics by discussing what breakpoints are, how they are implemented by most debuggers on the x86 and x64 Windows platforms, and how to perform instruction level stepping.

Breakpoints
A breakpoint can simply be defined as a place in a executing code that causes an intentional halt of execution. In the context of debuggers, this is useful because it allows the debugger to inspect the process at that moment in time when nothing is going on — when the process is effectively “broken”. Typical functionality that is seen in debuggers when the target is in a broken state is the ability to view and modify registers/memory, print a disassembly listing of the area surrounding the breakpoint, step into or step over assembly instructions or lines of source code, and other related diagnostic features. Breakpoints themselves can come in a few different varieties.

Hardware Interrupt Breakpoints
Hardware interrupt breakpoints are probably the most common and simple breakpoints to understand and to implement in a debugger. These are specific to the x86 and x64 architectures and are implemented by using a hardware-supported instruction specifically made to generate a software interrupt that is meant for debuggers. The opcode for this instruction is 0xCC and it matches to an INT 3 instruction. The way that most debuggers implement this is to replace the opcode at the desired address — that is, the breakpoint address — with the 0xCC opcode. When the code is called, the interrupt (EXCEPTION_BREAKPOINT) will be raised, which the debugger will handle and give the user the option to perform the debugging functionality mentioned above. When the user is finished inspecting the program state at that address and wishes to continue, the debugger will replace the original instruction back, make sure that the executing address (EIP or RIP registers, depending on the architecture) point to that original address, and continue execution.

However, an interesting problem comes up. When the original instruction is replaced back, the breakpoint will be lost. This may be alright if the breakpoint is meant to be hit only once, but that is very rarely the case. There needs to be a way to re-enable that breakpoint immediately afterwards. This is done by setting the EFlags (or RFlags for x64) register to set the processor into single-step mode. Fortunately, that is pretty easily accomplished by enabling the 8th bit, i.e. performing an OR with 0x100. When the EXCEPTION_BREAKPOINT exception is handled and execution resumes, there will be another exception, EXCEPTION_SINGLE_STEP, thrown on the next instruction executed. At this point, the debugger can re-enable the breakpoint on the previous instruction and resume execution.

On the Windows platform, it is easy to see all of this at work by inspecting the DebugBreak function. This function is meant to trigger a local (from within the same process) breakpoint. Below is the disassembly of the function, which shows it simply being what is described above. Those wondering about the mov edi, edi instruction can read more about hot-patchable images, specifically Raymond Chen’s post about the explanation.

_DebugBreak@0:
752C3C5D 8B FF                mov         edi,edi  
752C3C5F CC                   int         3  
752C3C60 C3                   ret  

Hardware Debug Registers
The second breakpoint implementation technique is also specific to the x86 and x64 architectures and takes advantage of specific debug registers provided by the instruction set. These are the debug registers DR0, DR1, DR2, DR3, and DR7. The first four registers are used to hold the addresses that a hardware breakpoint will break on. Using these means that for the entire program, there can be at most four hardware breakpoints active. The DR7 register is used to control enablement of these registers, with bits 0, 2, 4, 6 corresponding to on/off for DR0, … , DR3. Additionally, bits 16-17, 20-21, 24-25, and 28-29 function as a bitmask for DR0, … , DR3 for when these breakpoints will trigger, with 00 being on execution, 01 on read, and 11 on write.

Setting these breakpoints on the Windows platform is a bit tricky. They must be set on the processes main thread. This involves getting the main thread, opening a handle to it with at least THREAD_GET_CONTEXT and THREAD_SET_CONTEXT privileges, and getting/setting the threads context with GetThreadContext/SetThreadContext with the newly added debug registers to reflect the changes. Take note that no executable code in memory is modified to set these breakpoints. It is not like the previous case where an opcode had to be replaced. These are breakpoints that set and unset by changing the contents of hardware registers. What will happen when these are set is that the process will raise an EXCEPTION_SINGLE_STEP exception upon hitting the instruction at the address, which the debugger will then process in a fashion nearly identical to the way it would in the previous section. Due to the small number limitation, these won’t be presented in the sample breakpoint code in this post, but may eventually be written about in the future for the sake of completeness. I have written about their usage for API hooking in a previous post (excuse the dead links in the beginning). The implementation for a debugger is very close to how it is presented there.

Software Breakpoints
This last class of breakpoints is performed entirely in software and is tied strongly to how the operating system functions. Another name for them is memory breakpoints. They combine some of the best features of interrupt breakpoints, namely the ability to have as many as you’d like, with the best features of hardware breakpoints, which is that nothing in the executing code needs to be overwritten. However, there is a major drawback: they add a significant slowdown to the execution of the code due to their implementation.

Instead of being implemented at the address level, these are implemented at the page level. How these work is that the address where a breakpoint will be set will have its page permissions changed to that of a guard page using VirtualProtectEx. When any instruction on the page will be accessed, there will be an EXCEPTION_GUARD_PAGE exception thrown. The debugger will handle this exception and check if the offending address is that of the breakpoint address. If so, the debugger can perform the usual handling/user prompt as with any other breakpoint. If not then the debugger must perform some extra steps.

According to the documentation, the guard page protection will be removed from the page after it is raised. This means that once the exception is handled and execution continues, any access afterwards will not generate an EXCEPTION_GUARD_PAGE exception. So in the case that the instruction accessed is not the desired breakpoint address, the breakpoint will be lost. To remedy this, the technique similar to the one presented in the hardware interrupt breakpoint section will be used. The processor will enter in to single-step mode and continue execution. On the next instruction, there will be an EXCEPTION_SINGLE_STEP exception raised. This will be handled by the debugger and the guard page property will be re-enabled on the page. This implementation also will not be covered in this post, but may be covered in the future. I have written about it before, also in the context of API hooking, here.

Implementations

As mentioned above, enabling and disabling hardware interrupt breakpoints is simply a matter of overwriting opcodes with 0xCC (INT 3). In Windows, this is accomplished in a pretty straightforward manner with the usage of ReadProcessMemory and WriteProcessMemory.

const bool InterruptBreakpoint::EnableBreakpoint()
{
    SIZE_T ulBytes = 0;
    bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)m_dwAddress, &m_originalByte, sizeof(unsigned char), &ulBytes));
    if (bSuccess && ulBytes == sizeof(unsigned char))
    {
        bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess, (LPVOID)m_dwAddress, &m_breakpointOpcode, sizeof(unsigned char), &ulBytes));
        return bSuccess && (ulBytes == sizeof(unsigned char));
    }
    else
    {
        fprintf(stderr, "Could not read from address %p. Error = %X\n", m_dwAddress, GetLastError());
    }
 
    return false;
}

The original byte at the target address is read and stored and then overwritten with 0xCC. Nothing too shocking or out of the ordinary. Disabling the breakpoint is simply done by performing the opposite and writing back the original byte.

const bool InterruptBreakpoint::DisableBreakpoint()
{
    SIZE_T ulBytes = 0;
    const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess, (LPVOID)m_dwAddress, &m_originalByte, sizeof(unsigned char), &ulBytes));
    if (bSuccess && ulBytes == sizeof(unsigned char))
    {
        return true;
    }
    fprintf(stderr, "Could not write back original opcode to address %p. Error = %X\n", m_dwAddress, GetLastError());
 
    return false;
}

Now with the ability to enable/disable breakpoints, it is time to look at the handlers. As mentioned, upon accessing the instruction where the breakpoint resides, an EXCEPTION_BREAKPOINT exception will be raised. There are a few steps here that need to be done:

  1. Check if the breakpoint is in the breakpoint list. This is done because when the debugger first attaches, an EXCEPTION_BREAKPOINT will be raised from the debuggers attaching thread (see previous post). We don’t care about this exception, so just skip it and continue execution
  2. If it is in the breakpoint list, and therefore a breakpoint that the user explicitly set, it needs to be disabled. As mentionted before, this is to allow the original instruction to be executed.
  3. Open a handle to the thread and get the thread context. Change the execution pointer (EIP or RIP) to point to the breakpoint address and also enable single-step mode. Set the thread context to this newly modified context
  4. Save this breakpoint. This is to re-enable it during the EXCEPTION_SINGLE_STEP exception that will be raised immediately after continuing execution.
  5. Prompt the user to continue or perform single-steps. Wait for their response and continue execution afterwards depending on the choice.

Put into code, it looks like the following:

Register(DebugExceptions::eBreakpoint, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &exceptionRecord = dbgEvent.u.Exception.ExceptionRecord;
    const DWORD_PTR dwExceptionAddress = (DWORD_PTR)exceptionRecord.ExceptionAddress;
    fprintf(stderr, "Received breakpoint at address %p.\n", dwExceptionAddress);  
 
    Breakpoint *pBreakpoint = m_pDebugger->FindBreakpoint(dwExceptionAddress);
    if (pBreakpoint != nullptr)
    {
        if (pBreakpoint->Disable())
        {
            CONTEXT ctx = { 0 };
            ctx.ContextFlags = CONTEXT_ALL;
            HANDLE hThread = OpenThread(THREAD_GET_CONTEXT | THREAD_SET_CONTEXT, FALSE, dbgEvent.dwThreadId);
            if (hThread != NULL)
            {
                (void)GetThreadContext(hThread, &ctx);
#ifdef _M_IX86
                ctx.Eip = (DWORD_PTR)dwExceptionAddress;
#elif defined _M_AMD64
                ctx.Rip = (DWORD_PTR)dwExceptionAddress;
#else
#error "Unsupported architecture"
#endif
                ctx.EFlags |= 0x100;
                m_pDebugger->m_pLastBreakpoint = pBreakpoint;
                m_pDebugger->m_dwExecutingThreadId = dbgEvent.dwThreadId;
                (void)SetThreadContext(hThread, &ctx);
 
                fprintf(stderr, "Press c to continue or s to begin stepping.\n");
                (void)m_pDebugger->WaitForContinue();
            }
            else
            {
                fprintf(stderr, "Could not open handle to thread %p. Error = %X\n", dbgEvent.dwThreadId, GetLastError());
            }
        }
        else
        {
            fprintf(stderr, "Could not remove breakpoint at address %p.", dwExceptionAddress);
        }
    }
    SetContinueStatus(DBG_CONTINUE);
});

The handler for EXCEPTION_SINGLE_STEP exceptions serves two purposes: it is there to re-enable breakpoints that were just hit, and it is also there to allow the user to continue single-stepping execution of the program. As a result of this, there needs to be a flag declared that is set when the user wishes to enter single-step mode by themselves (instead of it being set through hitting a breakpoint). If the user is in single-step mode then show them a prompt and wait for a response to continue stepping or to continue execution entirely. Again, put into code, it looks like the following:

Register(DebugExceptions::eSingleStep, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &exceptionRecord = dbgEvent.u.Exception.ExceptionRecord;
    const DWORD_PTR dwExceptionAddress = (DWORD_PTR)exceptionRecord.ExceptionAddress;
    fprintf(stderr, "Received single step at address %p\n", dwExceptionAddress);
    if (m_pDebugger->m_bIsStepping)
    {
        fprintf(stderr, "Press s to continue stepping.\n");
        m_pDebugger->m_dwExecutingThreadId = dbgEvent.dwThreadId;
        (void)m_pDebugger->WaitForContinue();
    }
    if (!m_pDebugger->m_pLastBreakpoint->IsEnabled())
    {
        (void)m_pDebugger->m_pLastBreakpoint->Enable();
    }
 
    SetContinueStatus(DBG_CONTINUE);
});

Debugger in action

With everything in place, the debugger is ready to be tested out. The easiest way is to write a sample program that will be attached to:

#include <stdio.h>
#include <Windows.h>
 
void TestFunction()
{
    printf("Hello, World!\n");
}
 
int main(int argc, char *argv[])
{
    printf("Test function address: %p\n", TestFunction);
    while (true)
    {
        getchar();
        TestFunction();
    }
 
    return 0;
}

On my machine, TestFunction resided at 0x13B1000. After attaching and setting a breakpoint on this address, the debugger was able to successfully step execution of the program or continue entirely.

a
Target address: 0x13B1000
Received breakpoint at address 13B1000.
Press c to continue or s to begin stepping.
s
Received single step at address 13B1001
Press s to continue stepping.
s
Received single step at address 13B1003
Press s to continue stepping.
s
Received single step at address 13B1009
Press s to continue stepping.
s
Received single step at address 13B100A
Press s to continue stepping.
c

The stepped addresses successfully matched up with the disassembly.

013B1000 55                   push        ebp  
013B1001 8B EC                mov         ebp,esp  
013B1003 81 EC C0 00 00 00    sub         esp,0C0h  
013B1009 53                   push        ebx  
013B100A 56                   push        esi  

This was also tested on an x64 version (with a x64 build of the debugger) with equal success.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

November 27, 2014

Writing a Primitive Debugger: Part 1 (Basics)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 2:02 PM

As software developers (or reverse engineers), debuggers are an invaluable tool. They allow for runtime analysis and introspection/understanding of the program in order to find out how it works — or oftentimes doesn’t. This series of posts will go in to how they work and how to begin developing a primitive debugger targeting the Windows platform running on the x86 or x64 architectures.

Attaching/Detaching
In order to debug a process, a debugger must be able to attach to it. This means that there should be a way for the debugger to interact with the process in such a way that the debugger will have access to the processes address space, the ability to halt and continue execution, modify registers, and so on. Likewise, a debugger should be able to safety detach from a process and let it continue running when a debugging session is finished. On the Windows platform this is achieved by calling the DebugActiveProcess function and specifying the process identifier of the target. Alternatively, it can also be accomplished by calling CreateProcess with the DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS creation flags. This latter method will create a new process and attach to it rather than attaching to one already running on the system. Once attached, the debugger can specify a policy on whether to kill the process on detach with DebugSetProcessKillOnExit. Lastly, the process for detaching is a straightforward call to DebugActiveProcessStop. Putting all of these together produces code similar to the following:

const bool Debugger::Start()
{
    m_bIsActive = BOOLIFY(DebugActiveProcess(m_dwProcessId));
    if (m_bIsActive)
    {
        const bool bIsSuccess = BOOLIFY(DebugSetProcessKillOnExit(m_bKillOnExit));
        if (!bIsSuccess)
        {
            fprintf(stderr, "Could not set process kill on exit policy. Error = %X\n", GetLastError());
        }
        return DebuggerLoop();
    }
    else
    {
        fprintf(stderr, "Could not debug process %X. Error = %X\n", m_dwProcessId, GetLastError());
    }
 
    return false;
}
 
const bool Debugger::Stop()
{
    m_bIsActive = BOOLIFY(DebugActiveProcessStop(m_dwProcessId));
    if (!m_bIsActive)
    {
        fprintf(stderr, "Could not stop debugging process %X. Error = %X\n", m_dwProcessId, GetLastError());
    }
 
    return m_bIsActive;
}

where m_dwProcessId and m_bKillOnExit are parameters provided through the Debugger constructor (see sample code). The BOOLIFY macro is also just a simple

#define BOOLIFY(x) !!(x)

definition to encourage type safety. At this point we have a simple debugger that can attach to and detach from a target process, but cannot handle any debug events. This is where the debugging loop comes in.

The Debugging Loop
The core component of any debugger is the debugging loop. This is the part responsible for waiting for a debug event, processing said event, and then waiting for the next event, hence the loop. On the Windows platform, this is pretty straightforward. It’s actually straightforward enough that Microsoft wrote up a short MSDN page on what needs to be done at this step. The steps involved are that (in a loop), the debugger calls WaitForDebugEvent which waits for a debug event from the process. These debug events are enumerated here and are commonly events related to process and thread creation/destruction, loading or unloading of DLLs, any exceptions raised, or debug strings output specifically for a debugger to see. Once this function returns, it will populate a DEBUG_EVENT structure with the information related to the particular debug event. At this point, it is the debuggers job to handle the event. After handling the event, the handlers must provide a continue status to ContinueDebugEvent, which is a code telling the thread that raised the event how to carry on execution after the event was handled. For most events, i.e. CREATE_PROCESS_DEBUG_EVENT, LOAD_DLL_DEBUG_EVENT, etc, you want to continue execution since these events do not reflect anything wrong with program behavior, but are events to notify the debugger of changing process state. This is done by choosing DBG_CONTINUE as the continue status. For the exceptional cases, such as exceptions which lead to undefined program behavior such as access violations, illegal instruction execution, divides by zero, etc, the process is nearing a point of no return. The debuggers job at this point is to gather and display information relating to the crash and in most cases terminate the process. This termination can happen inside the handler itself for these exceptions, or the debugger can choose to continue the debug event with the DBG_EXCEPTION_NOT_HANDLED continue status, meaning that the debugger is relinquishing responsibility of handling this exception properly. In almost all cases, this will lead to the program terminating on its own immediately afterwards. However, there are sometimes corner cases, particularly in malware, where the process will install its own runtime exception handler as an obfuscation technique and produce its own runtime exceptions to be handled within this handler to carry out some functionality. Continuing the exception in this case would not result in a crash since the process is able to handle its own exception after the debugger forwards it along the exception handler chain. Putting cases like those aside for now, a typical debugging loop may look like the following:

const bool Debugger::DebuggerLoop()
{
    DEBUG_EVENT dbgEvent = { 0 };
    DWORD dwContinueStatus = 0;
    bool bSuccess = false;
 
    while (m_bIsActive)
    {
        bSuccess = BOOLIFY(WaitForDebugEvent(&dbgEvent, INFINITE));
        if (!bSuccess)
        {
            fprintf(stderr, "WaitForDebugEvent returned failure. Error = %X\n", GetLastError());
            return false;
        }
 
        m_pEventHandler->Notify((DebugEvents)dbgEvent.dwDebugEventCode, dbgEvent);
        dwContinueStatus = m_pEventHandler->ContinueStatus();
 
        bSuccess = BOOLIFY(ContinueDebugEvent(dbgEvent.dwProcessId, dbgEvent.dwThreadId, dwContinueStatus));
        if (!bSuccess)
        {
            fprintf(stderr, "ContinueDebugEvent returned failure. Error = %X\n", GetLastError());
            return false;
        }
    }
 
    return true;
}

with m_pEventHandler being responsible for registering handlers and setting a continue status to be passed along to ContinueDebugEvent. The example code registers handlers for events/exceptions and outputs information relevant to each event/exception. The style below is followed for all events and exceptions:

Register(DebugEvents::eCreateThread, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &info = dbgEvent.u.CreateThread;
    fprintf(stderr, "CREATE_THREAD_DEBUG_EVENT received.\n"
        "Handle: %X\n"
        "TLS base: %X\n"
        "Start address: %X\n",
        info.hThread, info.lpThreadLocalBase, info.lpStartAddress);
    SetContinueStatus(DBG_CONTINUE);
});
 
Register(DebugEvents::eCreateProcess, [&](const DEBUG_EVENT &dbgEvent)
{
    auto &info = dbgEvent.u.CreateProcessInfo;
    fprintf(stderr, "CREATE_PROCESS_DEBUG_EVENT received.\n"
        "Handle (image file): %X\n"
        "Handle (process): %X\n"
        "Handle (main thread): %X\n"
        "Image base address: %X\n"
        "Debug info file offset: %X\n"
        "Debug info size: %X\n"
        "TLS base: %X\n"
        "Start address: %X\n",
        info.hFile, info.hProcess, info.hThread, info.lpBaseOfImage,
        info.dwDebugInfoFileOffset, info.nDebugInfoSize, info.lpThreadLocalBase,
        info.lpStartAddress);
 
    m_hProcess = info.hProcess;
 
    SetContinueStatus(DBG_CONTINUE);
});

That should be all that is really needed to get  started on creating a basic debugger. This debugger features the ability attach/detach, handle debug events, and output information pertaining to these events. To test this out, we can create a simple program to generate an exception that will be caught by the debugger when it is attached.

#include <stdio.h>
#include <Windows.h>
 
int main(int argc, char *argv[])
{
    printf("Press enter to raise an exception.\n");
    (void)getchar();
    if (IsDebuggerPresent())
    {
        OutputDebugStringA("This should be seen by the debugger.\n");
        RaiseException(STATUS_ACCESS_VIOLATION, 0, 0, nullptr);
    }
    else
    {
        printf("Process was not being debugged.\n");
    }
 
    return 0;
}

Here is the output of the debugger upon attaching to the process and receiving the access violation:

CREATE_PROCESS_DEBUG_EVENT received.
Handle (image file): 4C
Handle (process): 48
Handle (main thread): 44
Image base address: EE0000
Debug info file offset: 0
Debug info size: 0
TLS base: 7F03F000
Start address: 0
LOAD_DLL_DEBUG_EVENT received.
Handle: 54
Base address: 77040000
Debug info file offset: 0
Debug info size: 0
Name: \\?\C:\Windows\SysWOW64\ntdll.dll
LOAD_DLL_DEBUG_EVENT received.
Handle: 5C
Base address: 76A00000
Debug info file offset: 0
Debug info size: 0
Name: \\?\C:\Windows\SysWOW64\kernel32.dll
LOAD_DLL_DEBUG_EVENT received.
Handle: 50
Base address: 765D0000
Debug info file offset: 0
Debug info size: 0
Name: \\?\C:\Windows\SysWOW64\KernelBase.dll
LOAD_DLL_DEBUG_EVENT received.
Handle: 60
Base address: F7F0000
Debug info file offset: 0
Debug info size: 0
Name: \\?\C:\Windows\SysWOW64\msvcr120d.dll
CREATE_THREAD_DEBUG_EVENT received.
Handle: 64
TLS base: 7F03C000
Start address: 770EBCFC
Received exception event.
First chance exception: 1
Exception code: 80000003
Exception flags: 0
Exception address: 770670BC
Number parameters (associated with exception): 1
Received breakpoint
EXIT_THREAD_DEBUG_EVENT received.
Thread 1B20 exited with code 0.
OUTPUT_DEBUG_STRING_EVENT received.
Debug string: This should be seen by the debugger.

Received exception event.
First chance exception: 1
Exception code: C0000005
Exception flags: 0
Exception address: 765E2F71
Number parameters (associated with exception): 0
Received access violation
Received exception event.
First chance exception: 0
Exception code: C0000005
Exception flags: 0
Exception address: 765E2F71
Number parameters (associated with exception): 0
Received access violation
EXIT_PROCESS_DEBUG_EVENT received.
Process 390 exited with code C0000005.

As you can see from the output, the debugger receives a CREATE_PROCESS_DEBUG_EVENT upon attaching. This event is always the first one triggered upon a debugger attaching and lets the debugger obtain a handle to the process that lets the debugger read/write from process memory, change the processes thread contexts, etc. It also may give information about any sort of debug information relevant to the process. Following that are the events related to any loaded DLLs in the process address space. Afterwards two interesting events come up. They are a CREATE_THREAD_DEBUG_EVENT and an exception with an exception code corresponding to EXCEPTION_BREAKPOINT. These are covered in the section below. Lastly, the debugger successfully displays the debug string provided by the process, and shows the access violation also being successfully received. Since the debugger sample code does not terminate the process, you can see the exception being raised multiple times. Initially it is raised as a first chance exception, but comes back around around as a second/last chance exception since the target process was not able to handle it. The process then terminates with a status code corresponding to EXCEPTION_ACCESS_VIOLATION.

What actually happens when a debugger is attached?
One current mystery about the debugger output may be where those CREATE_THREAD_DEBUG_EVENT, breakpoint, and EXIT_THREAD_DEBUG_EVENT events came from. These events are triggered as a result of the debugger attaching to the process. When the DebugActiveProcess is called, it forwards on to the NtDebugActiveProcess syscall. This syscall is responsible for setting up the process to be in a debugable state, which at the very least involves changing the BeingDebugged flag of the Process Environment Block for the target process — this is how the IsDebuggerPresent function works in the target process to check if it is being debugged. Afterwards, a thread will be created in the target process. This thread will have a start address that corresponds to an 0xCC (int 3) instruction, better known as a breakpoint on x86 and x64 architectures. This is why the debugger displays as having received a breakpoint. When execution is continued, this thread exits the process begins executing normally again.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

May 2, 2014

Messing with MSN Internet Games (2/2)

Filed under: Game Hacking,General x86-64,Reverse Engineering — admin @ 2:35 PM

spades
The not-too-long-awaited followup continues. This post will outline some of the internals of how the common network code residing in zgmprxy.dll works. This DLL is shared across Internet Checkers, Internet Backgammon, and Internet Spades to carry out all of the network functionality. Fortunately, or rather unfortunately from a challenge perspective, Microsoft has provided debugging symbols for zgmprxy.dll. This removes some of the challenge in finding interesting functions, but does still allow for some decent reverse engineering knowledge to actually understand how everything is working.

Starting Point

The obvious starting point for this is to load and look through the zgmproxy.pdb file provided through the Microsoft Symbol Server. There are tons of good functions to look through, but for the sake of brevity, I will be focusing on four of them here.

?BeginConnect@CStadiumSocket@@QEAAJQEAGK@Z
?SendData@CStadiumSocket@@QEAAHPEADIHH@Z
?DecryptSocketData@CStadiumSocket@@AEAAJXZ
?Disconnect@CStadiumSocket@@QEAAXXZ

Understanding how name decorations work allows for a recovery of a large amount of information, such as parameter number any types, function name and class membership information, calling convention (__thiscall for this case obviously, although I treat it as __stdcall with the “this” pointer as the first parameter in the example code), etc.

The Plan

The plan here does not change too much from what happened in the previous post:

  • Get into the address space of the target executable. Nothing here changes from last post.
  • Get the addresses of the above functions. This becomes very simple with the debug/symbol APIs provided by the WinAPI.
  • Install hooks at desired places on the functions.
  • Save off the CStadiumSocket instance so we can call functions in it at our own leisure. As an example for this post, it will be to send custom chat messages instead of the pre-selected ones offered by the games.

DllMain does not change drastically from the last revision.

int APIENTRY DllMain(HMODULE hModule, DWORD dwReason, LPVOID lpReserved)
{
        switch(dwReason)
    {
    case DLL_PROCESS_ATTACH:
        (void)DisableThreadLibraryCalls(hModule);
        if(AllocConsole())
        {
            freopen("CONOUT$", "w", stdout);
            SetConsoleTitle(L"Console");
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE);
            printf("DLL loaded.\n");
        }
        if(GetFunctions())
        {
            pExceptionHandler = AddVectoredExceptionHandler(TRUE, VectoredHandler);
            if(SetBreakpoints())
            {
                if(CreateThread(NULL, 0, DlgThread, hModule, 0, NULL) == NULL)
                    printf("Could not create dialog thread. Last error = %X\n", GetLastError());
            }
            else
            {
                printf("Could not set initial breakpoints.\n");
            }
            printf("CStadiumSocket::BeginConnect: %016X\n"
                "CStadiumSocket::SendData: %016X\n"
                "CStadiumSocket::DecryptSocketData: %016X\n"
                "CStadiumSocket::Disconnect: %016X\n",
                BeginConnectFnc, SendDataFnc, DecryptSocketDataFnc, DisconnectFnc);
        }
        break;
 
    case DLL_PROCESS_DETACH:
        //Clean up here usually
        break;
 
    case DLL_THREAD_ATTACH:
        break;
 
    case DLL_THREAD_DETACH:
        break;
    }
 
    return TRUE;
}

There are four functions now as well as a new thread which will hold a dialog to enter custom chat (discussed later). Memory breakpoints are still used and nothing has changed about how they are added. GetFunctions() has drastically changed in this revision. Instead of finding the target functions through GetProcAddress, the injected DLL can load up symbols at runtime and find the four desired functions through the use of the SymGetSymFromName64 function.

const bool GetFunctions(void)
{
    (void)SymSetOptions(SYMOPT_UNDNAME);
    if(SymInitialize(GetCurrentProcess(), "", TRUE))
    {
        IMAGEHLP_SYMBOL64 imageHelp = { 0 };
        imageHelp.SizeOfStruct = sizeof(IMAGEHLP_SYMBOL64);
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::BeginConnect", &imageHelp);
        BeginConnectFnc = (pBeginConnect)imageHelp.Address;
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::SendData", &imageHelp);
        SendDataFnc = (pSendData)imageHelp.Address;
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::DecryptSocketData", &imageHelp);
        DecryptSocketDataFnc = (pDecryptSocketData)imageHelp.Address;  
 
        (void)SymGetSymFromName64(GetCurrentProcess(), "CStadiumSocket::Disconnect", &imageHelp);
        DisconnectFnc = (pDisconnect)imageHelp.Address;
 
    }
    else
    {
        printf("Could not initialize symbols. Last error = %X", GetLastError());
    }
    return ((BeginConnectFnc != NULL) && (SendDataFnc != NULL)
        && (DecryptSocketDataFnc != NULL) && (DisconnectFnc != NULL));
}

Here symbols will be loaded with undecorated names and the target functions will be retrieved. The zgmprxy.pdb file must reside in one of the directories that SymInitialize checks, namely in one of the following:

    The current working directory of the application
    The _NT_SYMBOL_PATH environment variable
    The _NT_ALTERNATE_SYMBOL_PATH environment variable

That is really all there is in terms of large changes from last post, so it’s time to begin actually reversing these four functions.

?BeginConnect@CStadiumSocket@@QEAAJQEAGK@Z

As the function name implies, this is called to begin a connection with the matchmaking service and game. The control flow graph looks pretty straightforward, as is the functionality of BeginConnect.

msncfgFrom a cursory inspection, the function appears to be a wrapper around QueueUserWorkItem. It takes a URL and port number as input, and is responsible for initializing and formatting them in a way before launching an asynchronous task. My x64 -> C interpretation yields something similar to the following (x64 code in comment form, my C translation below). Allocation sizes were retrieved during a trace and don’t necessarily fully reflect the logic:

int CStadiumSocket::BeginConnect(wchar_t *pUrl, unsigned long ulPortNumber)
{
//.text:000007FF34FB24C7                 mov     rcx, r12        ; size_t
//.text:000007FF34FB24CA                 call    ??_U@YAPEAX_K@Z ; operator new[](unsigned __int64)
//.text:000007FF34FB24CF                 mov     rsi, rax
//.text:000007FF34FB24D2                 cmp     rax, rbx
//.text:000007FF34FB24D5                 jnz     short loc_7FF34FB24E1
    wchar_t *strPortNum = new wchar_t[32];
    if(strPortNum == NULL)
        return 0x800404DB;
 
//.text:000007FF34FB24E1                 mov     r8, r12         ; size_t
//.text:000007FF34FB24E4                 xor     edx, edx        ; int
//.text:000007FF34FB24E6                 mov     rcx, rax        ; void *
//.text:000007FF34FB24E9                 call    memset
    memset(pBuffer, 0, 32 * sizeof(wchar_t));
 
//.text:000007FF34FB24EE                 lea     r12, [rbp+3Ch]
//.text:000007FF34FB24F2                 mov     r11d, 401h
//.text:000007FF34FB24F8                 mov     rax, r12
//.text:000007FF34FB24FB                 sub     rdi, r12
//.text:000007FF34FB24FE
//.text:000007FF34FB24FE loc_7FF34FB24FE:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+77j
//.text:000007FF34FB24FE                 cmp     r11, rbx
//.text:000007FF34FB2501                 jz      short loc_7FF34FB251E
//.text:000007FF34FB2503                 movzx   ecx, word ptr [rdi+rax]
//.text:000007FF34FB2507                 cmp     cx, bx
//.text:000007FF34FB250A                 jz      short loc_7FF34FB2519
//.text:000007FF34FB250C                 mov     [rax], cx
//.text:000007FF34FB250F                 add     rax, 2
//.text:000007FF34FB2513                 sub     r11, 1
//.text:000007FF34FB2517                 jnz     short loc_7FF34FB24FE
//.text:000007FF34FB2519
//.text:000007FF34FB2519 loc_7FF34FB2519:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+6Aj
//.text:000007FF34FB2519                 cmp     r11, rbx
//.text:000007FF34FB251C                 jnz     short loc_7FF34FB2522
//.text:000007FF34FB251E
//.text:000007FF34FB251E loc_7FF34FB251E:                        ; CODE XREF: CStadiumSocket::BeginConnect(ushort * const,ulong)+61j
//.text:000007FF34FB251E                 sub     rax, 2 
    for(unsigned int i = 0; i < 1025; ++i)
    {
        m_pBuffer[i] = pUrl[i];
        if(pBuffer[i] == 0)
            break;
    }
 
//.text:000007FF34FB2522                 mov     r9d, 0Ah        ; int
//.text:000007FF34FB2528                 mov     rdx, rsi        ; wchar_t *
//.text:000007FF34FB252B                 mov     ecx, r13d       ; int
//.text:000007FF34FB252E                 lea     r8d, [r9+16h]   ; size_t
//.text:000007FF34FB2532                 mov     [rax], bx
//.text:000007FF34FB2535                 call    _itow_s
    (void)_itow_s(ulPortNumber, strPortNum, 32, 10);
 
//.text:000007FF34FB253A                 mov     [rbp+38h], r13d
//.text:000007FF34FB253E                 mov     r13d, 30h
//.text:000007FF34FB2544                 lea     rcx, [rsp+68h+var_48] ; void *
//.text:000007FF34FB2549                 mov     r8, r13         ; size_t
//.text:000007FF34FB254C                 xor     edx, edx        ; int
//.text:000007FF34FB254E                 mov     [rbp+85Ch], ebx
//.text:000007FF34FB2554                 call    memset
    char partialContextBuffer[48];
    memset(str, 0, sizeof(str));
 
//.text:000007FF34FB2559                 lea     ecx, [r13+28h]  ; size_t
//.text:000007FF34FB255D                 mov     [rsp+68h+var_44], ebx
//.text:000007FF34FB2561                 mov     [rsp+68h+var_40], 1
//.text:000007FF34FB2569                 call    ??2@YAPEAX_K@Z  ; operator new(unsigned __int64)
//.text:000007FF34FB256E                 mov     rdi, rax
//.text:000007FF34FB2571                 cmp     rax, rbx
//.text:000007FF34FB2574                 jz      short loc_7FF34FB257E
//.text:000007FF34FB2576                 mov     dword ptr [rax], 1
//.text:000007FF34FB257C                 jmp     short loc_7FF34FB2581
    char *pContextBuffer = new char[88]; 
    if(pContextBuffer == NULL)
        return 0x800404DB;
 
//.text:000007FF34FB2586                 lea     rcx, [rdi+18h]  ; void *
//.text:000007FF34FB258A                 lea     rdx, [rsp+68h+var_48] ; void *
//.text:000007FF34FB258F                 mov     r8, r13         ; size_t
//.text:000007FF34FB2592                 mov     [rdi+8], r12
//.text:000007FF34FB2596                 mov     [rdi+10h], rsi
//.text:000007FF34FB259A                 call    memmove
    *(pContextBuffer) = 1; //At 000007FF34FB2576
    *(pContextBuffer + 8) = &m_pBuffer;
    *(pContextBuffer + 16) = &strPortNum;
    memmove(&pContextBuffer[24], partialContextBuffer, 48);
 
//.text:000007FF34FB259F                 lea     r11, [rbp+0A80h]
//.text:000007FF34FB25A6                 lea     rax, [rbp+18h]
//.text:000007FF34FB25AA                 lea     rcx, ?AsyncGetAddrInfoW@CStadiumSocket@@SAKPEAX@Z ; Function
//.text:000007FF34FB25B1                 xor     r8d, r8d        ; Flags
//.text:000007FF34FB25B4                 mov     rdx, rdi        ; Context
//.text:000007FF34FB25B7                 mov     [rdi+48h], r11
//.text:000007FF34FB25BB                 mov     [rdi+50h], rax
//.text:000007FF34FB25BF                 call    cs:__imp_QueueUserWorkItem
//.text:000007FF34FB25C5                 cmp     eax, ebx
//.text:000007FF34FB25C7                 jnz     short loc_7FF34FB25D5
//.text:000007FF34FB25C9                 mov     ebx, 800404BFh
//.text:000007FF34FB25CE                 jmp     short loc_7FF34FB25D5
    if(QueueUserWorkItem(&AsyncGetAddrInfo, pContextBuffer, 0) == FALSE)
        return 0x800404BF;
 
//From success case
    return 0;
}

?SendData@CStadiumSocket@@QEAAHPEADIHH@Z

The next function to look at is the SendData function. This function formats the data to send and invokes OnASyncDataWrite to write it out. The function creates a buffer of max length 0x4010 (16400) bytes, copies in the message buffer, and appends a few fields to the end. There is some handling code in the event that the message is of a handshake type, or if it is a message that is to be queued up. Below is a mostly complete translation of the assembly.

int CStadiumSocket::SendData(char *pBuffer, unsigned int uiLength, bool bIsHandshake, bool bLastHandshake)
{
//.text : 000007FF34FB350C                 cmp     dword ptr[rcx + 0A88h], 0
//.text : 000007FF34FB3513                 mov     rax, [rcx + 840h]
//.text : 000007FF34FB351A                 mov     r13, rdx
//.text : 000007FF34FB351D                 mov     rax, [rax + 10h]
//.text : 000007FF34FB3521                 lea     rdx, aTrue; "true"
//.text : 000007FF34FB3528                 mov     rdi, rcx
//.text : 000007FF34FB352B                 mov[rsp + 58h + var_20], rax
//.text : 000007FF34FB3530                 lea     r11, aFalse; "false"
//.text : 000007FF34FB3537                 mov     ebp, r8d
//.text : 000007FF34FB353A                 mov     r10, r11
//.text : 000007FF34FB353D                 mov     rcx, r11
//.text : 000007FF34FB3540                 mov     r12d, r9d
//.text : 000007FF34FB3543                 cmovnz  r10, rdx
//.text : 000007FF34FB3547                 cmp[rsp + 58h + arg_20], 0
//.text : 000007FF34FB354F                 cmovnz  rcx, rdx
//.text : 000007FF34FB3553                 test    r9d, r9d
//.text : 000007FF34FB3556                 mov[rsp + 58h + var_28], r10
//.text : 000007FF34FB355B                 mov[rsp + 58h + var_30], rcx
//.text : 000007FF34FB3560                 cmovnz  r11, rdx
//.text : 000007FF34FB3564                 mov     r9d, r8d
//.text : 000007FF34FB3567                 lea     rcx, aCstadiumsoc_15; "CStadiumSocket::SendData:\n    BUFFER:  "...
//.text : 000007FF34FB356E                 mov     r8, r13
//.text : 000007FF34FB3571                 mov     edx, ebp
//.text : 000007FF34FB3573                 mov[rsp + 58h + var_38], r11
//.text : 000007FF34FB3578                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    QueueNode *pQueueNode = m_msgQueue;
 
    char *strIsHandshake = (bIsHandshake == 0) ? "true" : "false";
    char *strPostHandshake = (m_bPostHandshake == 0) ? "true" : "false";
    char *strLastHandshake = (bLastHandshake == 0) ? "true" : "false";
 
    SafeDbgLog("CStadiumSocket::SendData:    BUFFER:    \"%*.S\"    LENGTH:    %u    HANDSHAKE: %s    LAST HS:   %s    POST HS:   %s    Queue:     %u",
        uiLength, pBuffer, uiLength, strIsHandshake, strLastHandshake, strPostHandshake, pQueueNode.Count);
 
//.text : 000007FF34FB357D                 mov     ecx, 4010h; size_t
//.text : 000007FF34FB3582                 call ? ? 2@YAPEAX_K@Z; operator new(unsigned __int64)
//.text : 000007FF34FB3587                 mov     rsi, rax
//.text : 000007FF34FB358A                 mov[rsp + 58h + arg_0], rax
//.text : 000007FF34FB358F                 test    rax, rax
//.text : 000007FF34FB3592                 jz      loc_7FF34FB36B3
//.text : 000007FF34FB3598                 mov     ebx, 4000h
//.text : 000007FF34FB359D                 xor     edx, edx; int
//.text : 000007FF34FB359F                 mov     rcx, rax; void *
//.text : 000007FF34FB35A2                 mov     r8, rbx; size_t
//.text : 000007FF34FB35A5                 call    memset
//.text : 000007FF34FB35AA                 cmp     ebp, ebx
//.text : 000007FF34FB35AC                 mov     rdx, r13; void *
//.text : 000007FF34FB35AF                 cmovb   rbx, rbp
//.text : 000007FF34FB35B3                 mov     rcx, rsi; void *
//.text : 000007FF34FB35B6                 mov     r8, rbx; size_t
//.text : 000007FF34FB35B9                 call    memmove
//.text : 000007FF34FB35BE                 and     dword ptr[rsi + 4000h], 0
//.text : 000007FF34FB35C5                 mov[rsi + 4004h], ebp
//.text : 000007FF34FB35CB                 mov[rsi + 4008h], r12d
//.text : 000007FF34FB35D2                 and     dword ptr[rsi + 400Ch], 0
    char *pFullBuffer = new char[0x4010];
    if(pFullBuffer == NULL)
    {
        return 0;
    }
 
    memset(pFullBuffer, 0, 0x4000);
 
    uiLength = (uiLength < 0x4000) ? uiLength : 0x4000;
    memmove(pFullBuffer, pBuffer, uiLength);
 
    pFullBuffer[0x4000] = 0;
    pFullBuffer[0x4004] = uiLength;
    pFullBuffer[0x4008] = bPostHandshake;
    pFullBuffer[0x400C] = 0;
 
//.text : 000007FF34FB35D9                 test    r12d, r12d
//.text : 000007FF34FB35DC                 jz      short loc_7FF34FB3658
//.text : 000007FF34FB35DE                 mov     rax, [rdi + 840h]
//.text : 000007FF34FB35E5                 mov     rbx, [rax]
//.text : 000007FF34FB35E8                 test    rbx, rbx
//.text : 000007FF34FB35EB
//.text : 000007FF34FB35EB loc_7FF34FB35EB : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 119j
//.text : 000007FF34FB35EB                 jz      short loc_7FF34FB364F
//...
//.text : 000007FF34FB364F loc_7FF34FB364F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) : loc_7FF34FB35EBj
//.text : 000007FF34FB364F                 lea     rcx, aCstadiumsoc_18; "CStadiumSocket::SendData: AddTail in se"...
//.text : 000007FF34FB3656                 jmp     short loc_7FF34FB365F
//.text : 000007FF34FB3658; -------------------------------------------------------------------------- -
//.text : 000007FF34FB3658
//.text : 000007FF34FB3658 loc_7FF34FB3658 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + E8j
//.text : 000007FF34FB3658                 lea     rcx, aCstadiumsock_9; "CStadiumSocket::SendData: AddTail\n\n"
//.text : 000007FF34FB365F
//.text : 000007FF34FB365F loc_7FF34FB365F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 162j
//.text : 000007FF34FB365F                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    bool bAddTail = (!bPostHandshake || pQueueNode->Prev == NULL);
    if(!bPostHandshake)
    {
        SafeDbgLog("CStadiumSocket::SendData: AddTail\n\n");
    }
    else if(pQueueNode->Prev == NULL)
    {
        SafeDbgLog("CStadiumSocket::SendData: AddTail in search.");
    }
 
//.text : 000007FF34FB3664                 mov     rbx, [rdi + 840h]
//.text : 000007FF34FB366B                 lea     rdx, [rsp + 58h + arg_0]
//.text : 000007FF34FB3670                 mov     r8, [rbx + 8]
//.text : 000007FF34FB3674                 xor     r9d, r9d
//.text : 000007FF34FB3677                 mov     rcx, rbx
//.text : 000007FF34FB367A                 call ? NewNode@ 
//.text : 000007FF34FB367F                 mov     rcx, [rbx + 8]
//.text : 000007FF34FB3683                 test    rcx, rcx
//.text : 000007FF34FB3686                 jz      short loc_7FF34FB368D
//.text : 000007FF34FB3688 loc_7FF34FB3688 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 149j
//.text : 000007FF34FB3688                 mov[rcx], rax
//.text : 000007FF34FB368B                 jmp     short loc_7FF34FB3690
//.text : 000007FF34FB368D; -------------------------------------------------------------------------- -
//.text : 000007FF34FB368D
//.text : 000007FF34FB368D loc_7FF34FB368D : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 192j
//.text : 000007FF34FB368D
    if(bAddTail)
    {
        QueueNode *pNewNode = ATL::CAtlList::NewNode(pQueueNode->Top, pQueueNode->Prev, pQueueNode->Next);
        if(pQueueNode->Next == NULL)
        {
            pQueueNode->Next = pNewNode;
        }
        else
        {
            pQueueNode = pNewNode;
        }        
    }
 
//.text : 000007FF34FB3690                 cmp[rsp + 58h + arg_20], 0
//.text : 000007FF34FB3698                 mov[rbx + 8], rax
//.text : 000007FF34FB369C                 mov     ebx, 1
//.text : 000007FF34FB36A1                 jz      short loc_7FF34FB36A9
//.text : 000007FF34FB36A3                 mov[rdi + 0A88h], ebx
//.text : 000007FF34FB36A9
//.text : 000007FF34FB36A9 loc_7FF34FB36A9 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 1ADj
//.text : 000007FF34FB36A9                 mov     rcx, rdi
//.text : 000007FF34FB36AC                 call ? OnAsyncDataWrite@CStadiumSocket@@AEAAXXZ; CStadiumSocket::OnAsyncDataWrite(void)
        pQueueNode->Next = pQueueNode;
        m_bPostHandshake = bLastHandshake;
        OnASyncDataWrite();
    }
 
//.text : 000007FF34FB35EB                 jz      short loc_7FF34FB364F
//.text : 000007FF34FB35ED                 test    rbx, rbx
//.text : 000007FF34FB35F0                 jz      short loc_7FF34FB3644
//.text : 000007FF34FB35F2                 mov     rcx, [rbx + 10h]
//.text : 000007FF34FB35F6                 mov     rax, [rbx]
//.text : 000007FF34FB35F9                 test    rcx, rcx
//.text : 000007FF34FB35FC                 jz      short loc_7FF34FB3607
//.text : 000007FF34FB35FE                 cmp     dword ptr[rcx + 4008h], 0
//.text : 000007FF34FB3605                 jz      short loc_7FF34FB360F
//.text : 000007FF34FB3607
//.text : 000007FF34FB3607 loc_7FF34FB3607 : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 108j
//.text : 000007FF34FB3607                 mov     rbx, rax
//.text : 000007FF34FB360A                 test    rax, rax
//.text : 000007FF34FB360D                 jmp     short loc_7FF34FB35EB
//.text : 000007FF34FB360F; -------------------------------------------------------------------------- -
//.text : 000007FF34FB360F
//.text : 000007FF34FB360F loc_7FF34FB360F : ; CODE XREF : CStadiumSocket::SendData(char *, uint, int, int) + 111j
//.text : 000007FF34FB360F                 lea     rcx, aCstadiumsoc_28; "CStadiumSocket::SendData: InsertBefore "...
//.text : 000007FF34FB3616                 call ? SafeDbgLog@@YAXPEBGZZ; SafeDbgLog(ushort const *, ...)
    else if(bPostHandshake)
    {
        pQueueNode *pNodePtr = pQueueNode;
        while(pNodePtr->Next != NULL)
        {
            pNodePtr = pNodePtr->Next;
            if(pNodePtr.pData[0x4008] == 0)
            {
                break;
            } 
        }
        SafeDbgLog("CStadiumSocket::SendData: InsertBefore in search.");
 
//.text : 000007FF34FB361B                 mov     rsi, [rdi + 840h]
//.text : 000007FF34FB3622                 mov     r8, [rbx + 8]
//.text : 000007FF34FB3626                 lea     rdx, [rsp + 58h + arg_0]
//.text : 000007FF34FB362B                 mov     rcx, rsi
//.text : 000007FF34FB362E                 mov     r9, rbx
//.text : 000007FF34FB3631                 call ? NewNode@ 
//.text : 000007FF34FB3636                 mov     rcx, [rbx + 8]
//.text : 000007FF34FB363A                 test    rcx, rcx
//.text : 000007FF34FB363D                 jnz     short loc_7FF34FB3688
//.text : 000007FF34FB363F                 mov     [rsi], rax
//.text : 000007FF34FB3642                 jmp     short loc_7FF34FB3690
        QueueNode *pNewNode = ATL::CAtlList::NewNode(pQueueNode->Top, pQueueNode->Prev, pQueueNode->Next);
        //Follows same insertion logic, except for ->Prev. Sets handshake flag again.
        OnASyncDataWrite();
    }
}

The logic looks rather complicated, but it the overall picture is that this function is responsible for scheduling of messages leaving the network and tags them with their type (handshake or not). It allocates and writes the buffer to send out and inserts it in to the message queue, which is read by OnASyncDataWrite and sent out after adding the encryption layer. Hooking this function will allow for the filtering of messages leaving the client for purposes of logging, fuzzing/modification, or other suitable purposes.

?DecryptSocketData@CStadiumSocket@@AEAAJXZ

This function is responsible for decrypting socket data after it comes in over the network from the server. In the case that the client is sending packets, CStadiumSocket::SendData is called, which in turn calls CStadiumSocket::OnASyncDataWrite; correspondingly the reverse happens in the receive case, and a CStadiumSocket::OnASyncDataRead function calls CStadiumSocket::DecryptSocketData. The internal works of this function are not necessarily important, and I will omit my x64 -> C conversion notes. The important part is to get a pointer to the buffer that has been decrypted. Doing so will allow for monitoring of messages coming from the server and like the SendData case, allows for logging or fuzzing of incoming messages to test client robustness. Doing some runtime tracing of this function, I found a good spot to pull the decrypted data from:

//.text : 000007FF34FB3D20                 movsxd  rcx, dword ptr[rdi + 400Ch]
.text : 000007FF34FB3D27                 mov     r8d, [r12]; size_t
.text : 000007FF34FB3D2B                 mov     rdx, [r12 + 8]; void *
.text : 000007FF34FB3D30                 add     rcx, rdi; void *
.text : 000007FF34FB3D33                 call    memmove

After the call to memmove, RDX will contain the decrypted buffer, with R8 containing the size. This seems like the perfect place to set the hook, at CStadiumSocket::DecryptSocketData + 0x1C3.

?DecryptSocketData@CStadiumSocket@@AEAAJXZ

The last function to look at. What happens here is also not necessarily important for our needs; looking through the assembly, it send out a “goodbye” message, what internally is referred to as a SEC_HANDSHAKE by the application, and shuts down send operations on the socket. Messages are still received and written out to the debug log (in the event that debug logging is enabled), and the socket is fully shut down and cleaned up after nothing is left to receive. This function is only hooked if we plan on doing something across multiple games in the same program instance, e.g. we resign a game and start a new one without restarting the application. Seeing this function called allows us to know that the CStadiumSocket instance captured by CStadiumSocket::BeginConnect is no longer valid for use.

Wrapping Up

Having all of this done and analyzed, changing the vectored exception handler to hook these functions (or in the middle of a function in the case of CStadiumSocket::DecryptSocketData) is just as simple as it was in the last post:

LONG CALLBACK VectoredHandler(EXCEPTION_POINTERS *pExceptionInfo)
{
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION)
    {        
        pExceptionInfo->ContextRecord->EFlags |= 0x100;
 
        DWORD_PTR dwExceptionAddress = (DWORD_PTR)pExceptionInfo->ExceptionRecord->ExceptionAddress;
        CONTEXT *pContext = pExceptionInfo->ContextRecord;
 
        if(dwExceptionAddress == (DWORD_PTR)BeginConnectFnc)
        {
            pThisPtr = (void *)pContext->Rcx;
            printf("Starting connection. CStadiumSocket instance is at: %016X\n", pThisPtr);
        }
        else if(dwExceptionAddress == (DWORD_PTR)SendDataFnc)
        {
            DWORD_PTR *pdwParametersBase = (DWORD_PTR *)(pContext->Rsp + 0x28);
            SendDataHook((void *)pContext->Rcx, (char *)pContext->Rdx, (unsigned int)pContext->R8, (int)pContext->R9, (int)(*(pdwParametersBase)));
        }
        else if(dwExceptionAddress == (DWORD_PTR)DecryptSocketDataFnc + 0x1C3)
        {
            DecryptSocketDataHook((char *)pContext->Rdx, (unsigned int)pContext->R8);
        }
        else if(dwExceptionAddress == (DWORD_PTR)DisconnectFnc)
        {
            printf("Closing connection. CStadiumSocket instance is being set to NULL\n");
            pThisPtr = NULL;
        }
 
        return EXCEPTION_CONTINUE_EXECUTION;
    }
 
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP)
    {
        (void)SetBreakpoints();
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    return EXCEPTION_CONTINUE_SEARCH;
}

To have some fun, the injected DLL can create a dialog box for chat input and send it over to the server. The game server expects a numeric value corresponding to the allowed chat in the scrollbox, but does not do any checking on it. This allows for any arbitrary message to be sent over to the server and the player on the other side will see it. The only caveat is that spaces (0x20) characters must be converted to %20. The code is as follows

INT_PTR CALLBACK DialogProc(HWND hwndDlg, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    switch(uMsg)
    {
    case WM_COMMAND:
        switch(LOWORD(wParam))
        {
            case ID_SEND:
            {
                //Possible condition here where Disconnect is called while custom chat message is being sent.
                if(pThisPtr != NULL)
                {
                    char strSendBuffer[512] = { 0 };
                    char strBuffer[256] = { 0 };
                    GetDlgItemTextA(hwndDlg, IDC_CHATTEXT, strBuffer, sizeof(strBuffer) - 1);
 
					//Extremely unsafe example code, careful...
					for (unsigned int i = 0; i < strlen(strBuffer); ++i)
					{
						if (strBuffer[i] == ' ')
						{
							memmove(&strBuffer[i + 3], &strBuffer[i + 1], strlen(&strBuffer[i]));
							strBuffer[i] = '%';
							strBuffer[i + 1] = '2';
							strBuffer[i + 2] = '0';
						}
					}
 
                    _snprintf(strSendBuffer, sizeof(strSendBuffer) - 1,
                        "CALL Chat sChatText=%s&sFontFace=MS%%20Shell%%20Dlg&arfFontFlags=0&eFontColor=12345&eFontCharSet=1\r\n",
                        strBuffer);
 
                    SendDataFnc(pThisPtr, strSendBuffer, (unsigned int)strlen(strSendBuffer), 0, 1);
                }
            }
                break;
        }
    default:
        return FALSE;
    }
    return TRUE;
}
 
DWORD WINAPI DlgThread(LPVOID hModule)
{
    return (DWORD)DialogBox((HINSTANCE)hModule, MAKEINTRESOURCE(DLG_MAIN), NULL, DialogProc);
}

Here is an example of it at work:
customchat

Additional Final Words

Some other fun things to mess with:

  • Logging can be enabled by patching out
.text : 000007FF34FAB6FA                 cmp     cs : ? m_loggingEnabled@@3_NA, 0; bool m_loggingEnabled
.text : 000007FF34FAB701                 jz      short loc_7FF34FAB77E

and creating a “LoggingEnabled” expandable string registry key at HKEY_CURRENT_USER/Software/Microsoft/zone.com. The logs provide tons of debug output about the internal state changes of the application, e.g.

[Time: 05-01-2014 21:48:59.253]
CStadiumProxyBase::SetInternalState:
    OLD STATE:    0 (IST_NOT_CONNECTED)
    NEW STATE:    2 (IST_JOIN_PENDING)
    NEW STATUS:   1 (STADIUM_CONNECTION_CONNECTING)
    LIGHT STATUS: 0 (STADIUM_CONNECTION_NOT_CONNECTED)
    m_pFullState:    0x00000000
  • The values in the ZS_PublicELO and ZS_PrivateELO tags can be modified to be much higher values. If you do this on two clients you are guaranteed a match against yourself, unless someone else is also doing this.
  • The games have some cases where they do not perform full santization of game state, so making impossible moves is sometimes allowed.

The full source code relating to this can be found here.

April 30, 2014

Messing with MSN Internet Games (1/2)

Filed under: Game Hacking,General x86-64,Reverse Engineering — admin @ 12:08 AM

This post will entail the fun endeavors of reverse engineering the default MSN Internet Games that come with most “Professional” and higher versions of Windows (although discontinued from Windows 8 onwards). Namely the common protocol shared by Internet Backgammon, Internet Checkers, and Internet Spades.

backgammonUpon launching the game and connecting with another player, the first thing to do is to check what port everything is running on. In this case, it was port 443, which is the port most commonly used for SSL. This has the advantage of giving away a known protocol, but the disadvantage of not being able to read/modify any of the outgoing data. It can also mean that there is a custom protocol that is encrypted and has an SSL layer added on top before going out, but fortunately that is not the case here (spoilers).

ipStarting Point

Since SSL consists of part of the network code, the most logical place to start is in those respective modules which carry out the work: ncrypt.dll and bcrypt.dll. The prime target here is the SslEncryptPacket function. Presumably, this function will be called somewhere in the chain leading up in to the packet leaving the client. Per MSDN, two of the parameters for the function are:

pbInput [in]

    A pointer to the buffer that contains the packet to be encrypted.
cbInput [in]

    The length, in bytes, of the pbInput buffer.

If we can intercept the function call and inspect those parameters, there is a chance of being able to view the data that is leaving the client. If not, then inspecting further down the call stack will eventually lead to the plaintext anyway. There is also a corresponding SslDecryptPacket function which will serve as a starting point to getting and inspecting server responses.

The Plan

The plan of action is pretty straightforward.

  • Get into the address space of the target executable. This will be done through a simple DLL injection.
  • Find the target function for encrypting data (SslEncryptPacket) and decrypting data (follow call from SslDecryptPacket down).
  • Install hooks on these two functions. The chosen method will be through memory breakpoints.
  • Inspect the contents of incoming and outgoing messages in plaintext. Determine the protocol and begin messing with it.

The first step won’t be covered here due to the hundreds of different DLL injection tutorials/guides/tools already out there. The code in the injected DLL will be a pretty direct translation of the above steps. Something akin to the code below:

int APIENTRY DllMain(HMODULE hModule, DWORD dwReason, LPVOID lpReserved)
{
    switch(dwReason)
    {
    case DLL_PROCESS_ATTACH:
        (void)DisableThreadLibraryCalls(hModule);
        if(AllocConsole())
        {
            freopen("CONOUT$", "w", stdout);
            SetConsoleTitle(L"Console");
            SetConsoleTextAttribute(GetStdHandle(STD_OUTPUT_HANDLE), FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE);
            printf("DLL loaded.\n");
        }
        if(GetFunctions())
        {
            pExceptionHandler = AddVectoredExceptionHandler(TRUE, VectoredHandler);
            if(SetBreakpoints())
            {
                printf("BCryptHashData: %016X\n"
                    "SslEncryptPacket: %016X\n",
                    BCryptHashDataFnc, SslEncryptPacketFnc);
            }
            else
            {
                printf("Could not set initial breakpoints.\n");
            }
        }
        break;
 
    case DLL_PROCESS_DETACH:
        //Clean up here usually
        break;
 
    case DLL_THREAD_ATTACH:
        break;
 
    case DLL_THREAD_DETACH:
        break;
    }
 
    return TRUE;
}

A “debug console” instance is created to save effort on having to attach a debugger in each testing instance. Pointers to the desired functions are then retrieved through the GetFunctions() function, and lastly memory breakpoints are installed on the two functions (encryption/decryption) to monitor the data being passed to them. For those wondering where BCryptHashData came from, it was traced down from SslDecryptData. It is actually called on both encryption/decryption, but will serve as the point of monitoring received messages from the server (in this post at least).

The second step is very easy and straightforward. By injecting a DLL into the process, we have full access to the process address space, and it is a simple matter of calling GetProcAddress on the desired target functions. This becomes basic WinAPI knowledge.

FARPROC WINAPI GetExport(const HMODULE hModule, const char *pName)
{
    FARPROC pRetProc = (FARPROC)GetProcAddress(hModule, pName);
    if(pRetProc == NULL)
    {
        printf("Could not get address of %s. Last error = %X\n", pName, GetLastError());
    }
 
    return pRetProc;
}
 
const bool GetFunctions(void)
{
    HMODULE hBCryptDll = GetModuleHandle(L"bcrypt.dll");
    HMODULE hNCryptDll = GetModuleHandle(L"ncrypt.dll");
    if(hBCryptDll == NULL)
    {
        printf("Could not get handle to Bcrypt.dll. Last error = %X\n", GetLastError());
        return false;
    }
    if(hNCryptDll == NULL)
    {
        printf("Could not get handle to Bcrypt.dll. Last error = %X\n", GetLastError());
        return false;
    }
    printf("Module handle: %016X\n", hBCryptDll);
 
    BCryptHashDataFnc = (pBCryptHashData)GetExport(hBCryptDll, "BCryptHashData");
    SslEncryptPacketFnc = (pSslEncryptPacket)GetExport(hNCryptDll, "SslEncryptPacket");
 
    return ((BCryptHashDataFnc != NULL) && (SslEncryptPacketFnc != NULL));
}

Installing the hooks (via memory breakpoints) is just an adaptation of the previous post on it. The code looks as follows:

const bool AddBreakpoint(void *pAddress)
{
    SIZE_T dwSuccess = 0;
 
    MEMORY_BASIC_INFORMATION memInfo = { 0 };
    dwSuccess = VirtualQuery(pAddress, &memInfo, sizeof(MEMORY_BASIC_INFORMATION));
    if(dwSuccess == 0)
    {
        printf("VirtualQuery failed on %016X. Last error = %X\n", pAddress, GetLastError());
        return false;
    }
 
    DWORD dwOldProtections = 0;
    dwSuccess = VirtualProtect(pAddress, sizeof(DWORD_PTR), memInfo.Protect | PAGE_GUARD, &dwOldProtections);
    if(dwSuccess == 0)
    {
        printf("VirtualProtect failed on %016X. Last error = %X\n", pAddress, GetLastError());
        return false;
    }
 
    return true;
}
 
const bool SetBreakpoints(void)
{
    bool bRet = AddBreakpoint(BCryptHashDataFnc);
    bRet &= AddBreakpoint(SslEncryptPacketFnc);
 
    return bRet;
}
 
LONG CALLBACK VectoredHandler(EXCEPTION_POINTERS *pExceptionInfo)
{
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION)
    {        
        pExceptionInfo->ContextRecord->EFlags |= 0x100;
 
        DWORD_PTR dwExceptionAddress = (DWORD_PTR)pExceptionInfo->ExceptionRecord->ExceptionAddress;
        CONTEXT *pContext = pExceptionInfo->ContextRecord;
 
        if(dwExceptionAddress == (DWORD_PTR)SslEncryptPacketFnc)
        {
            DWORD_PTR *pdwParametersBase = (DWORD_PTR *)(pContext->Rsp + 0x28);
            SslEncryptPacketHook((NCRYPT_PROV_HANDLE)pContext->Rcx, (NCRYPT_KEY_HANDLE)pContext->Rdx, (PBYTE *)pContext->R8, (DWORD)pContext->R9,
                (PBYTE)(*(pdwParametersBase)), (DWORD)(*(pdwParametersBase + 1)), (DWORD *)(*(pdwParametersBase + 2)), (ULONGLONG)(*(pdwParametersBase + 3)),
                (DWORD)(*(pdwParametersBase + 4)), (DWORD)(*(pdwParametersBase + 5)));
        }
        else if(dwExceptionAddress == (DWORD_PTR)BCryptHashDataFnc)
        {
            BCryptHashDataHook((BCRYPT_HASH_HANDLE)pContext->Rcx, (PUCHAR)pContext->Rdx, (ULONG)pContext->R8, (ULONG)pContext->R9);
        }
 
        return EXCEPTION_CONTINUE_EXECUTION;
    }
 
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP)
    {
        (void)SetBreakpoints();
        return EXCEPTION_CONTINUE_EXECUTION;
    }
    return EXCEPTION_CONTINUE_SEARCH;
}

checkersSoftware breakpoints will be set on the memory page that SslEncryptPacket and BCryptHashData are on. When these are hit a STATUS_GUARD_PAGE_VIOLATION will be raised and caught by the topmost vectored exception handler that the injected DLL installed upon load. The exception address will be checked against the two desired target addresses (SslEncryptPacket/BCryptHashData) and an inspection function will be called. In this case it will just echo the contents of the plaintext data buffers out to the debug console instance.  The single-step flag will be set so the program can continue execution by one instruction before raising a STATUS_SINGLE_STEP exception, upon which the memory breakpoints will be reinstalled (since guard page flags are cleared after the page gets accessed). For a more in-depth explanation, see the linked post related to memory breakpoints posted before on this blog.

The x64 ABI (on Windows) stores the first four parameters in RCX, RDX, R8, and R9 respectively, and the rest on the stack. There is no need to worry about locating any extra parameters in the case of BCryptHashData, which only takes four. However, SslEncryptData takes ten parameters, so there are another six to locate. In this case, there is no reason to care beyond the fourth parameter, but all of them are passed in for the sake of completeness. The base of the parameters on the stack were found by looking at how the function is called and verifying with a debugger during runtime.

The “hook” code, as mentioned above, will just print out the data buffers. The implementation is given below:

void WINAPI BCryptHashDataHook(BCRYPT_HASH_HANDLE hHash, PUCHAR pbInput, ULONG cbInput, ULONG dwFlags)
{
    printf("--- BCryptHashData ---\n"
        "Input: %.*s\n",
        cbInput, pbInput);
}
 
void WINAPI SslEncryptPacketHook(NCRYPT_PROV_HANDLE hSslProvider, NCRYPT_KEY_HANDLE hKey, PBYTE *pbInput, DWORD cbInput,
                              PBYTE pbOutput, DWORD cbOutput, DWORD *pcbResult, ULONGLONG SequenceNumber, DWORD dwContentType, DWORD dwFlags)
{
    printf("--- SslEncryptPacket ---\n"
        "Input: %.*s\n",
        cbInput, pbInput);
}

What Does It Look Like?

After everything is completed, it is time to inspect the protocol. Below are some selected packet logs from a session of Checkers.

STATE {some large uuid}
Length: 0x000003CD
 
<?xml version="1.0"?>
<StateMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="h
ttp://www.w3.org/2001/XMLSchema" xsi:type="StateMessageEx" xmlns="http://zone.ms
n.com/stadium/wincheckers/">
  <nSeq>4</nSeq>
  <nRole>0</nRole>
  <eStatus>Ready</eStatus>
  <nTimestamp>578</nTimestamp>
  <sMode>normal</sMode>
  <arTags>
    <Tag>
      <id>chatbyid</id>
      <oValue xsi:type="ChatTag">
        <UserID>numeric user id</UserID>
        <Nickname>numeric nickname</Nickname>
        <Text>SYSTEM_ENTER</Text>
        <FontFace>MS Shell Dlg</FontFace>
        <FontFlags>0</FontFlags>
        <FontColor>255</FontColor>
        <FontCharSet>1</FontCharSet>
        <MessageFlags>2</MessageFlags>
      </oValue>
    </Tag>
    <Tag>
      <id>STag</id>
      <oValue xsi:type="STag">
        <MsgID>StartCountDownTimer</MsgID>
        <MsgIDSbKy />
        <MsgD>0</MsgD>
      </oValue>
    </Tag>
  </arTags>
</StateMessage>
 
STATE {some large uuid}
Length: 0x000006D1
 
<?xml version="1.0"?>
<StateMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="h
ttp://www.w3.org/2001/XMLSchema" xsi:type="StateMessageEx" xmlns="http://zone.ms
n.com/stadium/wincheckers/">
  <nSeq>5</nSeq>
  <nRole>0</nRole>
  <eStatus>Ready</eStatus>
  <nTimestamp>2234</nTimestamp>
  <sMode>normal</sMode>
  <arTags>
    <Tag>
      <id>STag</id>
      <oValue xsi:type="STag">
        <MsgID>FrameworkUpdate</MsgID>
        <MsgIDSbKy />
        <MsgD>&lt;D&gt;&lt;StgSet&gt;&lt;SeatCnt&gt;2&lt;/SeatCnt&gt;&lt;GameT&g
t;AUTOMATCH&lt;/GameT&gt;&lt;AILvls&gt;2&lt;/AILvls&gt;&lt;GameM&gt;INIT_GAME&lt
;/GameM&gt;&lt;Start&gt;True&lt;/Start&gt;&lt;PMatch&gt;False&lt;/PMatch&gt;&lt;
ShowTeam&gt;False&lt;/ShowTeam&gt;&lt;/StgSet&gt;&lt;/D&gt;</MsgD>
      </oValue>
    </Tag>
    <Tag>
      <id>STag</id>
      <oValue xsi:type="STag">
        <MsgID>GameInit</MsgID>
        <MsgIDSbKy>GameInit</MsgIDSbKy>
        <MsgD>&lt;GameInit&gt;&lt;Role&gt;0&lt;/Role&gt;&lt;Players&gt;&lt;Playe
r&gt;&lt;Role&gt;0&lt;/Role&gt;&lt;Name&gt;8201314a      01&lt;/Name&gt;&lt;Type
&gt;Human&lt;/Type&gt;&lt;/Player&gt;&lt;Player&gt;&lt;Role&gt;1&lt;/Role&gt;&lt
;Name&gt;1d220e29      01&lt;/Name&gt;&lt;Type&gt;Human&lt;/Type&gt;&lt;/Player&
gt;&lt;/Players&gt;&lt;Board&gt;&lt;Row&gt;0,1,0,1,0,1,0,1&lt;/Row&gt;&lt;Row&gt
;1,0,1,0,1,0,1,0&lt;/Row&gt;&lt;Row&gt;0,1,0,1,0,1,0,1&lt;/Row&gt;&lt;Row&gt;0,0
,0,0,0,0,0,0&lt;/Row&gt;&lt;Row&gt;0,0,0,0,0,0,0,0&lt;/Row&gt;&lt;Row&gt;3,0,3,0
,3,0,3,0&lt;/Row&gt;&lt;Row&gt;0,3,0,3,0,3,0,3&lt;/Row&gt;&lt;Row&gt;3,0,3,0,3,0
,3,0&lt;/Row&gt;&lt;/Board&gt;&lt;GameType&gt;Standard&lt;/GameType&gt;&lt;/Game
Init&gt;</MsgD>
      </oValue>
    </Tag>
  </arTags>
</StateMessage>
 
CALL EventSend messageID=EventSend&XMLDataString=%3CMessage%3E%3CMove%3E%
3CSource%3E%3CX%3E6%3C/X%3E%3CY%3E5%3C/Y%3E%3C/Source%3E%3CTarget%3E%3CX%3E7%3C/
X%3E%3CY%3E4%3C/Y%3E%3C/Target%3E%3C/Move%3E%3C/Message%3E
 
CALL EventSend messageID=EventSend&XMLDataString=%3CMessage%3E%3CGameMana
gement%3E%3CMethod%3EResignGiven%3C/Method%3E%3C/GameManagement%3E%3C/Message%3E

The protocol basically screams XML-RPC. It appears that the entire state of the game is initialized and carried out over these XML messages. From a security perspective, it also presents an interesting target to fuzz, given the large variety of fields present within these messages, and the presence of a length field in the message.

Some Issues With This Approach

There are some issues with this approach. Firstly, ncrypt.dll and bcrypt.dll are delay loaded, so our DLL will have to be injected after a multiplayer session starts, or there will have to be some polling loop introduced to check whether these two DLLs have loaded. This is ugly and there is a much better way to get around this that will be talked about in the next post. Secondly, BCryptHashData is used for both incoming and outgoing messages. This makes it more difficult if we wish to mess with these messages as there will have to be logic added to distinguish between client and server messages. This will also be resolved in the next post.

The full source code relating to this can be found here.

December 6, 2013

Calling Undocumented APIs in the Windows Kernel

Filed under: General x86,General x86-64,Reverse Engineering — admin @ 7:51 PM

Background

This post takes a different approach from the others and delves into the world of the Windows kernel. Specifically, it will cover how to access the undocumented APIs that are present within the kernel (ntoskrnl). If you trace a Windows API call from usermode to the kernel, you will find the endpoint to be something similar to what is shown below (Win 8 x64):

public NtOpenFile
NtOpenFile proc near
4C 8B D1                           mov r10, rcx
B8 31 00 00 00                     mov eax, 31h
0F 05                              syscall
C3                                 retn
NtOpenFile endp

where the r10 register holds the value of the first argument and eax holds the index into the Windows internal syscall table. A note should be made that this is specific to a x64 operating system running a native x64 application. x86 systems rely on going through KiFastSystemCall in ntdll to achieve invoking a syscall, and WOW64 emulation relies on making transitions from x64 to x86 and back and setting up an appropriate stack in-between. When the syscall instruction executes, the flow of code will eventually find itself to NtOpenFile in ntoskrnl. This is actually a wrapper around IopCreateFile (shown below):

public NtOpenFile
NtOpenFile proc near
4C 8B DC                            mov     r11, rsp
48 81 EC 88 00 00 00                sub     rsp, 88h
8B 84 24 B8 00 00 00                mov     eax, [rsp+88h+arg_28]
45 33 D2                            xor     r10d, r10d
4D 89 53 F0                         mov     [r11-10h], r10
C7 44 24 70 20 00 00 00             mov     [rsp+88h+var_18], 20h
45 89 53 E0                         mov     [r11-20h], r10d
4D 89 53 D8                         mov     [r11-28h], r10
45 89 53 D0                         mov     [r11-30h], r10d
45 89 53 C8                         mov     [r11-38h], r10d
4D 89 53 C0                         mov     [r11-40h], r10
89 44 24 40                         mov     [rsp+88h+var_48], eax
8B 84 24 B0 00 00 00                mov     eax, [rsp+88h+arg_20]
C7 44 24 38 01 00 00 00             mov     [rsp+88h+var_50], 1
89 44 24 30                         mov     [rsp+88h+var_58], eax
45 89 53 A0                         mov     [r11-60h], r10d
4D 89 53 98                         mov     [r11-68h], r10
E8 48 E2 FC FF                      call    IopCreateFile
48 81 C4 88 00 00 00                add     rsp, 88h
C3                                  retn
NtOpenFile endp

Again it should be noted that there was a lot of hand-waving going on here, and that the syscall instruction does not simply invoke the native kernel API, but goes through several routines responsible for setting up trap frames and performing access checks before arriving at the native API implementation.
Exported native kernel APIs for use in drivers also follow a similar, but nowhere near as complex mechanism. Every Zw* function in the kernel provides a thin wrapper around a call to the Nt* version (example shown below):

NTSTATUS __stdcall ZwOpenFile(PHANDLE FileHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    PIO_STATUS_BLOCK IoStatusBlock,
    ULONG ShareAccess,
    ULONG OpenOptions)
ZwOpenFile proc near
48 8B C4                            mov     rax, rsp
FA                                  cli
48 83 EC 10                         sub     rsp, 10h
50                                  push    rax
9C                                  pushfq
6A 10                               push    10h
48 8D 05 BD 2F 00 00                lea     rax, KiServiceLinkage
50                                  push    rax
B8 31 00 00 00                      mov     eax, 31h
E9 C2 DA FF FF                      jmp KiServiceInternal
ZwOpenFile endp

This wrapper does basic things such as set up the stack, disable kernel interrupts (cli), and preserve flags. The KiServiceLinkage function is just a small stub that executes the ret instruction immediately. I have not had a chance to reverse it to see what purpose it serves — it was never even invoked when a breakpoint was set on it. Lastly, the syscall number (0x31) is put into eax and a jump to the KiServiceInternal routine is made. This routine, among other things, is responsible for setting the correct PreviousMode and traversing the Windows syscall table (commonly referred to as the System Service Dispatch Table, or SSDT) and invoking the native Nt* version of the API.

Getting Access to the APIs
So what is the relevance of all of this? The answer is that even though the kernel exports a ton of APIs for kernel/driver developers, there are still plenty of other ones which provide some pretty cool functionality — ones like ZwSuspendProcess/ZwResumeProcess, ZwReadVirtualMemory/ZwWriteVirtualMemory, etc, that are not available. Getting access to those APIs is really where this post begins. Before starting, there are several clear issues that need to be resolved:

  • The base address and image size in memory of the kernel (ntoskrnl) need to be found. This is obviously because the APIs lay somewhere within that memory region.
  • The syscalls need to be identified and there should be a generic way developed to allow us to invoke them.
  • Other issues related to using the APIs should be addressed. For example, process enumeration in the kernel in order to get a valid process handle for the target process in a ZwSuspend/ZwResume call.

Addressing these in order, the first point is relatively simple, but also relies on undocumented features. Getting the address of the kernel in memory is as simple as calling ZwQuerySystemInformation with the undocumented SYSTEM_INFORMATION_CLASS structure. What will be returned is a pointer to a SYSTEM_MODULE_INFORMATION structure containing a count of loaded modules in memory followed by the variable length array of SYSTEM_MODULE pointers. A quick note to add is that the NtInternals documentation on the structure is a bit outdated, and that the first two fields are of type ULONG_PTR instead of always a 32-bit ULONG. Finding the kernel base address and image size is simple a traversal of the SYSTEM_MODULE array and a substring search for the kernel name. The code is shown below:

PSYSTEM_MODULE GetKernelModuleInfo(VOID) {
 
    PSYSTEM_MODULE SystemModule = NULL;
    PSYSTEM_MODULE FoundModule = NULL;
    ULONG_PTR SystemInfoLength = 0;
    PVOID Buffer = NULL;
    ULONG Count = 0;
    ULONG i = 0;
    ULONG j = 0;
    //For names for WinXP
    CONST CHAR *KernelNames[] = { "ntoskrnl.exe", "ntkrnlmp.exe", "ntkrnlpa.exe", "ntkrpamp.exe" };
 
    //Perform error checking on the calls in actual code
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, &SystemInfoLength, 0, &SystemInfoLength);
    Buffer = ExAllocatePool(NonPagedPool, SystemInfoLength);
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, Buffer, SystemInfoLength, NULL);
 
    Count = ((PSYSTEM_MODULE_INFORMATION)Buffer)->ModulesCount;
    for(i = 0; i < Count; ++i) {         
        SystemModule = &((PSYSTEM_MODULE_INFORMATION)Buffer)->Modules[i];
        for(j = 0; j < sizeof(KernelNames) / sizeof(KernelNames[0]); ++j) {             
            if(strstr((LPCSTR)SystemModule->Name, KernelNames[j]) != NULL) {
                FoundModule = (PSYSTEM_MODULE)ExAllocatePool(NonPagedPool, sizeof(SYSTEM_MODULE));
                RtlCopyMemory(FoundModule, SystemModule, sizeof(SYSTEM_MODULE));
                ExFreePool(Buffer);
                return FoundModule;
             }
        }
    }
    DbgPrint("Could not find the kernel in module list\n");
    return NULL;
}

The above function will return the PSYSTEM_MODULE corresponding to information about the kernel (or NULL in the failure case). Now that the base address and image size of the kernel are known, it is possible to begin coming up with a way to invoke the undocumented syscalls.
Since all of the undocumented Zw* calls are nearly identical wrappers (with the exception of the syscall number) invoking KiSystemService, I present the generic way of invoking these calls by creating a functionality equivalent template of this in kernel memory and executing off of that. The general idea is to create a blank template such as the one shown below:

BYTE NullStub = 0xC3;
 
BYTE SyscallTemplate[] = {
    0x48, 0x8B, 0xC4,                                           /*mov rax, rsp*/
    0xFA,                                                       /*cli*/
    0x48, 0x83, 0xEC, 0x10,                                     /*sub rsp, 0x10*/
    0x50,                                                       /*push rax*/
    0x9C,                                                       /*pushfq*/
    0x6A, 0x10,                                                 /*push 0x10*/
    0x48, 0xB8, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rax, NullStubAddress*/
    0x50,                                                       /*push rax*/
    0xB8, 0xBB, 0xBB, 0xBB, 0xBB,				/*mov eax, Syscall*/
    0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push LowBytes*/
    0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC,             /*mov [rsp+0x4], HighBytes*/
    0xC3							/*ret*/
};

in non paged memory, patch in the correct addresses (NullStub replacing KiServiceLinkage), patch in the syscall, then invoke KiSystemService (here done by moving the 64-bit absolute address on the stack and returning to it). Once fully patched at runtime, this data can simply be cased to the appropriate function pointer and invoked like normal. Here is the allocation and patching routine:

PVOID CreateSyscallWrapper(IN LONG Index) {
 
    PVOID Buffer = ExAllocatePool(NonPagedPool, sizeof(SyscallTemplate));
    BYTE *NullStubAddress = &NullStub;
    BYTE *NullStubAddressIndex = ((BYTE *)Buffer) + (14 * sizeof(BYTE));
    BYTE *SyscallIndex = ((BYTE *)Buffer) + (24 * sizeof(BYTE));
    BYTE *LowBytesIndex = ((BYTE *)Buffer) + (29 * sizeof(BYTE));
    BYTE *HighBytesIndex = ((BYTE *)Buffer) + (37 * sizeof(BYTE));
    ULONG LowAddressBytes = ((ULONG_PTR)KiSystemService) & 0xFFFFFFFF;
    ULONG HighAddressBytes = ((ULONG_PTR)KiSystemService >> 32);
    RtlCopyMemory(Buffer, SyscallTemplate, sizeof(SyscallTemplate));
    RtlCopyMemory(NullStubAddressIndex, (PVOID)&NullStubAddress, sizeof(BYTE *));
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(LowBytesIndex, &LowAddressBytes, sizeof(ULONG));
    RtlCopyMemory(HighBytesIndex, &HighAddressBytes, sizeof(ULONG));
    return Buffer;
}

Example usage of this is again shown below:

typedef NTSTATUS (NTAPI *pZwSuspendProcess)(IN HANDLE ProcessHandle);
pZwSuspendProcess ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x017A, 1);
//This can then be invoked as normal, e.g, ZwSuspendProcess(x);

However, before doing that, the address of KiServiceInternal needs to be found so it can be properly patched in. This is, after all, partially why finding the base address of the kernel was important. This is done through scanning for the function signature through the entirely of ntoskrnl’s memory. The signature must be sufficiently long as to be unique, but preferably not so long that comparisons take a lot of time. The signature that I used for this example is shown below:

typedef VOID (*pKiSystemService)(VOID);
pKiSystemService KiSystemService;
 
NTSTATUS ResolveFunctions(IN PSYSTEM_MODULE KernelInfo) {
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x48, 0x83, 0xEC, 0x08, 0x55, 0x48, 0x81, 0xEC, 0x58, 0x01,
        0x00, 0x00, 0x48, 0x8D, 0xAC, 0x24, 0x80, 0x00, 0x00, 0x00,
        0x48, 0x89, 0x9D, 0xC0, 0x00, 0x00, 0x00, 0x48, 0x89, 0xBD,
        0xC8, 0x00, 0x00, 0x00, 0x48, 0x89, 0xB5, 0xD0, 0x00, 0x00,
        0x00, 0xFB, 0x65, 0x48, 0x8B, 0x1C, 0x25, 0x88, 0x01, 0x00,
        0x00
    };
    KiSystemService = (pKiSystemService)FindFunctionInModule(KiSystemServiceSignature,
        sizeof(KiSystemServiceSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
        if(KiSystemService == NULL) {
            DbgPrint("- Could not find KiSystemService\n");
            return STATUS_UNSUCCESSFUL;
        }
    DbgPrint("+ Found KiSystemService at %p\n", KiSystemService);
    //....
}
 
...
...
 
PVOID FindFunctionInModule(IN CONST BYTE *Signature, IN ULONG SignatureSize,
    IN PVOID KernelBaseAddress, IN ULONG ImageSize) {
 
    BYTE *CurrentAddress = 0;
    ULONG i = 0;
 
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
 
    for(i = 0; i < ImageSize; ++i) {
        if(RtlCompareMemory(CurrentAddress, Signature, SignatureSize) == SignatureSize) {
            DbgPrint("+ Found function at %p\n", CurrentAddress);
            return (PVOID)CurrentAddress;
        }
    ++CurrentAddress;
    }
return NULL;
}

Once the ResolveFunctions() function executes, the CreateSyscallWrapper function is ready to be used as shown above. This will now resolve any syscall that you wish to call.

An Example

The code below is an example I wrote up showing how to write into the virtual address space of a target process. This process is given by name to the OpenProcess function, which retrieves the appropriate EPROCESS block corresponding to the process and opens a handle to it. This handle is then used in conjunction with the undocumented APIs associated with process manipulation (ZwSuspendProcess/ZwResumeProcess) and virtual memory manipulation (ZwProtectVirtualMemory/ZwWriteVirtualMemory). An internal undocumented function (PsGetNextProcess) is also scanned for and retrieved in order to help facilitate process enumeration. The code was written for and tested on an x86 version of Windows XP SP3 and x64 Windows 7 SP1.

#include "stdafx.h"
 
#include "Undocumented.h"
#include <wdm.h>
 
#ifdef __cplusplus
extern "C" NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING  RegistryPath);
#endif
 
pPsGetProcessImageFileName PsGetProcessImageFileName;
pPsGetProcessSectionBaseAddress PsGetProcessSectionBaseAddress;
pPsGetNextProcess PsGetNextProcess;
pZwSuspendProcess ZwSuspendProcess;
pZwResumeProcess ZwResumeProcess;
pZwProtectVirtualMemory ZwProtectVirtualMemory;
pZwWriteVirtualMemory ZwWriteVirtualMemory;
pKiSystemService KiSystemService;
 
#ifdef _M_IX86
__declspec(naked) VOID SyscallTemplate(VOID) {
    __asm {
    /*B8 XX XX XX XX   */ mov eax, 0xC0DE
    /*8D 54 24 04      */ lea edx, [esp + 0x4]
    /*9C               */ pushfd
    /*6A 08            */ push 0x8
    /*FF 15 XX XX XX XX*/ call KiSystemService
    /*C2 XX XX         */ retn 0xBBBB
    }
}
#elif defined(_M_AMD64)
 
BYTE NullStub = 0xC3;
 
BYTE SyscallTemplate[] =
{
    0x48, 0x8B, 0xC4,                                           /*mov rax, rsp*/
    0xFA,                                                       /*cli*/
    0x48, 0x83, 0xEC, 0x10,                                     /*sub rsp, 0x10*/
    0x50,                                                       /*push rax*/
    0x9C,                                                       /*pushfq*/
    0x6A, 0x10,                                                 /*push 0x10*/
    0x48, 0xB8, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rax, NullStubAddress*/
    0x50,                                                       /*push rax*/
    0xB8, 0xBB, 0xBB, 0xBB, 0xBB,                               /*mov eax, Syscall*/
    0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push LowBytes*/
    0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC,             /*mov [rsp+0x4], HighBytes*/
    0xC3                                                        /*ret*/
};
#endif
 
PVOID FindFunctionInModule(IN CONST BYTE *Signature,
    IN ULONG SignatureSize,
    IN PVOID KernelBaseAddress,
    IN ULONG ImageSize) {
 
    BYTE *CurrentAddress = 0;
    ULONG i = 0;
 
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
 
    for(i = 0; i < ImageSize; ++i) {
        if(RtlCompareMemory(CurrentAddress, Signature, SignatureSize) == SignatureSize) {
            DbgPrint("+ Found function at %p\n", CurrentAddress);
            return (PVOID)CurrentAddress;
        }
    ++CurrentAddress;
    }
    return NULL;
}
 
NTSTATUS ResolveFunctions(IN PSYSTEM_MODULE KernelInfo) {
 
    UNICODE_STRING PsGetProcessImageFileNameStr = {0};
    UNICODE_STRING PsGetProcessSectionBaseAddressStr = {0};
#ifdef _M_IX86
    CONST BYTE PsGetNextProcessSignature[] =
    {
        0x8B, 0xFF, 0x55, 0x8B, 0xEC, 0x51, 0x83, 0x65,
        0xFC, 0x00, 0x56, 0x57, 0x64, 0xA1, 0x24, 0x01, 0x00, 0x00,
        0x8B, 0xF0, 0xFF, 0x8E, 0xD4, 0x00, 0x00, 0x00, 0xB9, 0xC0,
        0x38, 0x56, 0x80, 0xE8, 0xB4, 0xEE, 0xF6, 0xFF, 0x8B, 0x45,
        0x08, 0x85, 0xC0
    };
#elif defined(_M_AMD64)
    CONST BYTE PsGetNextProcessSignature[] =
    {
        0x48, 0x89, 0x5C, 0x24, 0x08, 0x48, 0x89, 0x6C, 0x24, 0x10,
        0x48, 0x89, 0x74, 0x24, 0x18, 0x57, 0x41, 0x54, 0x41, 0x55,
        0x41, 0x56, 0x41, 0x57, 0x48, 0x83, 0xEC, 0x20, 0x65, 0x48,
        0x8B, 0x34, 0x25, 0x88, 0x01, 0x00, 0x00, 0x45, 0x33, 0xED,
        0x48, 0x8B, 0xF9, 0x66, 0xFF, 0x8E, 0xC6, 0x01, 0x00, 0x00,
        0x4D, 0x8B, 0xE5, 0x41, 0x8B, 0xED, 0x41, 0x8D, 0x4D, 0x11,
        0x33, 0xC0,
    };
#endif
#ifdef _M_IX86
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x6A, 0x00, 0x55, 0x53, 0x56, 0x57, 0x0F, 0xA0, 0xBB, 0x30,
        0x00, 0x00, 0x00, 0x66, 0x8E, 0xE3, 0x64, 0xFF, 0x35, 0x00,
        0x00, 0x00, 0x00
    };
#elif defined(_M_AMD64)
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x48, 0x83, 0xEC, 0x08, 0x55, 0x48, 0x81, 0xEC, 0x58, 0x01,
        0x00, 0x00, 0x48, 0x8D, 0xAC, 0x24, 0x80, 0x00, 0x00, 0x00,
        0x48, 0x89, 0x9D, 0xC0, 0x00, 0x00, 0x00, 0x48, 0x89, 0xBD,
        0xC8, 0x00, 0x00, 0x00, 0x48, 0x89, 0xB5, 0xD0, 0x00, 0x00,
        0x00, 0xFB, 0x65, 0x48, 0x8B, 0x1C, 0x25, 0x88, 0x01, 0x00,
        0x00
    };
#endif
    RtlInitUnicodeString(&PsGetProcessImageFileNameStr, L"PsGetProcessImageFileName");
    RtlInitUnicodeString(&PsGetProcessSectionBaseAddressStr, L"PsGetProcessSectionBaseAddress");
 
    PsGetProcessImageFileName = (pPsGetProcessImageFileName)MmGetSystemRoutineAddress(&PsGetProcessImageFileNameStr);
    if(PsGetProcessImageFileName == NULL) {
        DbgPrint("- Could not find PsGetProcessImageFileName\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetProcessImageFileName at %p\n", PsGetProcessImageFileName);
 
    PsGetProcessSectionBaseAddress = (pPsGetProcessSectionBaseAddress)MmGetSystemRoutineAddress(&PsGetProcessSectionBaseAddressStr);
    if(PsGetProcessSectionBaseAddress == NULL) {
        DbgPrint("- Could not find PsGetProcessSectionBaseAddress\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetProcessSectionBaseAddress at %p\n", PsGetProcessSectionBaseAddress);
 
    PsGetNextProcess = (pPsGetNextProcess)FindFunctionInModule(PsGetNextProcessSignature,
        sizeof(PsGetNextProcessSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
    if(PsGetNextProcess == NULL) {
        DbgPrint("- Could not find PsGetNextProcess\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetNextProcess at %p\n", PsGetNextProcess);
 
    KiSystemService = (pKiSystemService)FindFunctionInModule(KiSystemServiceSignature,
        sizeof(KiSystemServiceSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
    if(KiSystemService == NULL) {
        DbgPrint("- Could not find KiSystemService\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found KiSystemService at %p\n", KiSystemService);
 
    return STATUS_SUCCESS;
}
 
VOID OnUnload(IN PDRIVER_OBJECT DriverObject) {
 
    DbgPrint("+ Unloading\n");
}
 
PSYSTEM_MODULE GetKernelModuleInfo(VOID) {
 
    PSYSTEM_MODULE SystemModule = NULL;
    PSYSTEM_MODULE FoundModule = NULL;
    ULONG_PTR SystemInfoLength = 0;
    PVOID Buffer = NULL;
    ULONG Count = 0;
    ULONG i = 0;
    ULONG j = 0;
    //Other names for WinXP
    CONST CHAR *KernelNames[] = { "ntoskrnl.exe", "ntkrnlmp.exe", "ntkrnlpa.exe", "ntkrpamp.exe" };
 
    //Perform error checking on the calls in actual code
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, &SystemInfoLength, 0, &SystemInfoLength);
    Buffer = ExAllocatePool(NonPagedPool, SystemInfoLength);
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, Buffer, SystemInfoLength, NULL);
 
    Count = ((PSYSTEM_MODULE_INFORMATION)Buffer)->ModulesCount;
    for(i = 0; i < Count; ++i) {
        SystemModule = &((PSYSTEM_MODULE_INFORMATION)Buffer)->Modules[i];
        for(j = 0; j < sizeof(KernelNames) / sizeof(KernelNames[0]); ++j) {
            if(strstr((LPCSTR)SystemModule->Name, KernelNames[j]) != NULL) {
                FoundModule = (PSYSTEM_MODULE)ExAllocatePool(NonPagedPool, sizeof(SYSTEM_MODULE));
                RtlCopyMemory(FoundModule, SystemModule, sizeof(SYSTEM_MODULE));
                ExFreePool(Buffer);
                return FoundModule;
            }
        }
    }
    DbgPrint("Could not find the kernel in module list\n");
    return NULL;
}
 
PEPROCESS GetEPROCESSFromName(IN CONST CHAR *ImageName) {
 
    PEPROCESS ProcessHead = PsGetNextProcess(NULL);
    PEPROCESS Process = PsGetNextProcess(NULL);
    CHAR *ProcessName = NULL;
 
    do {
        ProcessName = PsGetProcessImageFileName(Process);
        DbgPrint("+ Currently looking at %s\n", ProcessName);
        if(strstr(ProcessName, ImageName) != NULL) {
            DbgPrint("+ Found the process -- %s\n", ProcessName);
            return Process;
        }
        Process = PsGetNextProcess(Process);
    } while(Process != NULL && Process != ProcessHead);
    DbgPrint("- Could not find %s\n", ProcessName);
    return NULL;
}
 
HANDLE GetProcessIdFromEPROCESS(PEPROCESS Process) {
 
    return PsGetProcessId(Process);
}
 
HANDLE OpenProcess(IN CONST CHAR *ProcessName, OUT OPTIONAL PEPROCESS *pProcess) {
 
    HANDLE ProcessHandle = NULL;
    CLIENT_ID ClientId = {0};
    OBJECT_ATTRIBUTES ObjAttributes = {0};
    PEPROCESS EProcess = GetEPROCESSFromName(ProcessName);
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    if(EProcess == NULL) {
        return NULL;
    }
    InitializeObjectAttributes(&ObjAttributes, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
    ObjAttributes.ObjectName = NULL;
    ClientId.UniqueProcess = GetProcessIdFromEPROCESS(EProcess);
    ClientId.UniqueThread = NULL;
 
    Status = ZwOpenProcess(&ProcessHandle, PROCESS_ALL_ACCESS, &ObjAttributes, &ClientId);
    if(!NT_SUCCESS(Status)) {
        DbgPrint("- Could not open process %s. -- %X\n", ProcessName, Status);
        return NULL;
    }
    if(pProcess != NULL) {
        *pProcess = EProcess;
    }
    return ProcessHandle;
}
 
PVOID CreateSyscallWrapper(IN LONG Index, IN SHORT NumParameters) {
 
#ifdef _M_IX86
    SIZE_T StubLength = 0x15;
    PVOID Buffer = ExAllocatePool(NonPagedPool, StubLength);
    BYTE *SyscallIndex = ((BYTE *)Buffer) + sizeof(BYTE);
    BYTE *Retn = ((BYTE *)Buffer) + (0x13 * (sizeof(BYTE)));
    RtlCopyMemory(Buffer, SyscallTemplate, StubLength);
    NumParameters = NumParameters * sizeof(ULONG_PTR);
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(Retn, &NumParameters, sizeof(SHORT));
    return Buffer;
#elif defined(_M_AMD64)
    PVOID Buffer = ExAllocatePool(NonPagedPool, sizeof(SyscallTemplate));
    BYTE *NullStubAddress = &NullStub;
    BYTE *NullStubAddressIndex = ((BYTE *)Buffer) + (14 * sizeof(BYTE));
    BYTE *SyscallIndex = ((BYTE *)Buffer) + (24 * sizeof(BYTE));
    BYTE *LowBytesIndex = ((BYTE *)Buffer) + (29 * sizeof(BYTE));
    BYTE *HighBytesIndex = ((BYTE *)Buffer) + (37 * sizeof(BYTE));
    ULONG LowAddressBytes = ((ULONG_PTR)KiSystemService) & 0xFFFFFFFF;
    ULONG HighAddressBytes = ((ULONG_PTR)KiSystemService >> 32);
    RtlCopyMemory(Buffer, SyscallTemplate, sizeof(SyscallTemplate));
    RtlCopyMemory(NullStubAddressIndex, (PVOID)&NullStubAddress, sizeof(BYTE *));
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(LowBytesIndex, &LowAddressBytes, sizeof(ULONG));
    RtlCopyMemory(HighBytesIndex, &HighAddressBytes, sizeof(ULONG));
    return Buffer;
#endif
}
 
VOID InitializeSyscalls(VOID) {
 
#ifdef _M_IX86
    ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x00FD, 1);
    ZwResumeProcess = (pZwResumeProcess)CreateSyscallWrapper(0x00CD, 1);
    ZwProtectVirtualMemory = (pZwProtectVirtualMemory)CreateSyscallWrapper(0x0089, 5);
    ZwWriteVirtualMemory = (pZwWriteVirtualMemory)CreateSyscallWrapper(0x0115, 5);
#elif defined(_M_AMD64)
    ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x017A, 1);
    ZwResumeProcess = (pZwResumeProcess)CreateSyscallWrapper(0x0144, 1);
    ZwProtectVirtualMemory = (pZwProtectVirtualMemory)CreateSyscallWrapper(0x004D, 5);
    ZwWriteVirtualMemory = (pZwWriteVirtualMemory)CreateSyscallWrapper(0x0037, 5);
#endif
}
 
VOID FreeSyscalls(VOID) {
 
    ExFreePool(ZwSuspendProcess);
    ExFreePool(ZwResumeProcess);
    ExFreePool(ZwProtectVirtualMemory);
    ExFreePool(ZwWriteVirtualMemory);
}
 
PVOID GetProcessBaseAddress(IN PEPROCESS Process) {
 
    return PsGetProcessSectionBaseAddress(Process);
}
 
NTSTATUS WriteToProcessAddress(IN HANDLE ProcessHandle, IN PVOID BaseAddress, IN BYTE *NewBytes, IN SIZE_T NewBytesSize) {
 
    ULONG OldProtections = 0;
    SIZE_T BytesWritten = 0;
    SIZE_T NumBytesToProtect = NewBytesSize;
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    //Needs error checking
    Status = ZwSuspendProcess(ProcessHandle);
    Status = ZwProtectVirtualMemory(ProcessHandle, &BaseAddress, &NumBytesToProtect, PAGE_EXECUTE_READWRITE, &OldProtections);
    Status = ZwWriteVirtualMemory(ProcessHandle, BaseAddress, NewBytes, NewBytesSize, &BytesWritten);
    Status = ZwProtectVirtualMemory(ProcessHandle, &BaseAddress, &NumBytesToProtect, OldProtections, &OldProtections);
    Status = ZwResumeProcess(ProcessHandle);
 
    return STATUS_SUCCESS;
}
 
NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING  RegistryPath) {
 
    PSYSTEM_MODULE KernelInfo = NULL;
    PEPROCESS Process = NULL;
    HANDLE ProcessHandle = NULL;
    PVOID BaseAddress = NULL;
    BYTE NewBytes[0x100] = {0};
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    DbgPrint("+ Driver successfully loaded\n");
 
    DriverObject->DriverUnload = OnUnload;
 
    KernelInfo = GetKernelModuleInfo();
    if(KernelInfo == NULL) {
        DbgPrint("Could not find kernel module\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found kernel module.\n"
        "+ Name: %s -- Base address: %p -- Size: %p\n", KernelInfo->Name,
        KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
 
    if(!NT_SUCCESS(ResolveFunctions(KernelInfo))) {
        return STATUS_UNSUCCESSFUL;
    }
 
    InitializeSyscalls();
 
    ProcessHandle = OpenProcess("notepad.exe", &Process);
    if(ProcessHandle == NULL) {
        return STATUS_UNSUCCESSFUL;
    }
    BaseAddress = GetProcessBaseAddress(Process);
    if(BaseAddress == NULL) {
        return STATUS_UNSUCCESSFUL;
    }
 
    DbgPrint("Invoking\n");
    RtlFillMemory(NewBytes, sizeof(NewBytes), 0x90);
    (VOID)WriteToProcessAddress(ProcessHandle, BaseAddress, NewBytes, sizeof(NewBytes));
    DbgPrint("+ Done\n");
 
    ExFreePool(KernelInfo);
    FreeSyscalls();
    ZwClose(ProcessHandle);
 
    return STATUS_SUCCESS;
}
« Newer PostsOlder Posts »

Powered by WordPress