RCE Endeavors 😅

January 15, 2015

Virtual Method Table (VMT) Hooking

Filed under: Game Hacking,General x86,General x86-64,Programming — admin @ 1:39 PM

This post will cover the topic of hooking a classes’ virtual method table. This is a useful technique that has many applications, but is most commonly seen in developing game hacks. For example, employing VMT hooking of objects in a Direct3D/OpenGL graphics engine is how in-game overlays are displayed.

Virtual Method Tables (or vtables)

Usage of VMTs, in the context of C++ for this post, is how polymorphism is implemented at the language level. Internally, the VMT is represented as an array of function pointers, and typically resides at the beginning or end of the memory layout of the object. Whenever a C++ class declares a virtual function, the compiler will add an entry in to the VMT for it. If a class inherits from a base object and overrides a base virtual function, then the pointer to the overriden function will be present in the derived objects VMT. For example, take the following code, compiled with the VS 2013 compiler on an x86 system:

class Base
{
public:
    Base() { printf("-  Base::Base\n"); }
    virtual ~Base() { printf("-  Base::~Base\n"); }
 
    void A() { printf("-  Base::A\n"); }
    virtual void B() { printf("-  Base::B\n"); }
    virtual void C() { printf("-  Base::C\n"); }
};
 
class Derived final : public Base
{
public:
    Derived() { printf("-  Derived::Derived\n"); }
    ~Derived() { printf("-  Derived::~Derived\n"); }
 
    void B() override { printf("-  Derived::B\n"); }
    void C() override { printf("-  Derived::C\n"); }
};

with the instances of Base and Derived created as follows:

Base base;
Derived derived;
Base *pBase = new Derived;

The class Base has three virtual functions: ~Base, B, and C. The class Derived, which inherits from Base overrides the two virtual functions B and C. In memory, the VMT for Base will contain ~Base, B, and C, as can be inspected with the debugger:

vt1while the VMT for the two Derived instances contain ~Derived, B, and C, but with different addresses for each than the ones in Base (see below).

vt3
vt2So how are these actually used? Take, for example, a function that takes a pointer to a Base instance and invokes the functions A, B, and C, on it:

void Invoke(Base * const pBase)
{
    pBase->A();
    pBase->B();
    pBase->C();
}

and is invoked in the following manner:

    Invoke(&base);
    Invoke(&derived);
    Invoke(pBase);

The Invoke function disassembled for x86 is as follows:

    pBase->A();
004012C9 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012CC E8 8F FE FF FF       call        Base::A (0401160h)  
    pBase->B();
004012D1 8B 45 08             mov         eax,dword ptr [pBase]  
004012D4 8B 10                mov         edx,dword ptr [eax]  
004012D6 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012D9 8B 42 04             mov         eax,dword ptr [edx+4]  
004012DC FF D0                call        eax  
    pBase->C();
004012DE 8B 45 08             mov         eax,dword ptr [pBase]  
004012E1 8B 10                mov         edx,dword ptr [eax]  
004012E3 8B 4D 08             mov         ecx,dword ptr [pBase]  
004012E6 8B 42 08             mov         eax,dword ptr [edx+8]  
004012E9 FF D0                call        eax  

This disassembly shows exactly what is going on under the hood with relation to polymorphism. For the invocations to B and C, the compiler moves the address of the object in to the EAX register. This is then dereferenced to get the base of the VMT and stored in the EDX register. The appropriate VMT entry for the function is found by using EDX as an index and storing the address in EAX. This function is then called. Since Base and Derived have different VMTs, this code will call different functions — the appropriate ones — for the appropriate object type. Seeing how it’s done under the hood also allows us to easily write a function to print the VMT.

void PrintVTable(Base * const pBase)
{
    unsigned int *pVTableBase = (unsigned int *)(*(unsigned int *)pBase);
    printf("First: %p\n"
        "Second: %p\n"
        "Third: %p\n",
        *pVTableBase, *(pVTableBase + 1), *(pVTableBase + 2));
}

Hooking the VMT

Knowing the layout of the VMT makes it trivial to hook. To accomplish this, all that is needed is to overwrite the entry in the VMT with the address of the desired hook function. This is done by using the VirtualProtect function to set the appropriate memory permissions alongside with memcpy to write in the desired hook address. Note that memcpy is used since everything resides within the same address space, otherwise WriteProcessMemory would have to be used. A hooking routine might look like the following:

void HookVMT(Base * const pBase)
{
    unsigned int *pVTableBase = (unsigned int *)(*(unsigned int *)pBase);
    unsigned int *pVTableFnc = (unsigned int *)((pVTableBase + 1));
    void *pHookFnc = (void *)VMTHookFnc;
 
    SIZE_T ulOldProtect = 0;
    (void)VirtualProtect(pVTableFnc, sizeof(void *), PAGE_EXECUTE_READWRITE, &ulOldProtect);
    memcpy(pVTableFnc, &pHookFnc, sizeof(void *));
    (void)VirtualProtect(pVTableFnc, sizeof(void *), ulOldProtect, &ulOldProtect);
}

and VMTHook having a simple definition of

void __fastcall VMTHookFnc(void *pEcx, void *pEdx)
{
    Base *pThisPtr = (Base *)pEcx;
 
    printf("In VMTHookFnc\n");
}

Here the fastcall calling convention is used to easily retrieve the this pointer, which is typically stored in the ECX register.

Applications

The application of this technique will show how to hook IDXGISwapChain::Present and allow for rendering/overlaying of text on a Direct3D10 application. This is not the only way to overlay text, nor necessarily the best, but still provides an adequate example to illustrate the point. The target application will be a Direct3D10 sample provided by the June 2010 DirectX SDK. See /Samples/C++/Direct3D10/Tutorials/Tutorial01 in the SDK. The sample application initializes the Direct3D device and swap chain with a call to D3D10CreateDeviceAndSwapChain then simply sets up a view and renders a blue background on the window (screenshot below).screen1

To overlay text on a Direct3D application, the IDXGISwapChain object must be obtained. Then the Present function of the interface must be hooked, since that is the function responsible for showing the rendered image to the user. This is done here by hooking D3D10CreateDeviceAndSwapChain. Once this function is hooked, the hook will call the real D3D10CreateDeviceAndSwapChain function in order to set up the IDXGISwapChain interface. Then the VMT entry for Present will be replaced with a hooked version that renders text. Put into code it looks like the following:

HRESULT WINAPI D3D10CreateDeviceAndSwapChainHook(IDXGIAdapter *pAdapter, D3D10_DRIVER_TYPE DriverType, HMODULE Software,
    UINT Flags, UINT SDKVersion, DXGI_SWAP_CHAIN_DESC *pSwapChainDesc, IDXGISwapChain **ppSwapChain,
    ID3D10Device **ppDevice)
{
 
    printf("In D3D10CreateDeviceAndSwapChainHook\n");
 
    //Create the device and swap chain
    HRESULT hResult = pD3D10CreateDeviceAndSwapChain(pAdapter, DriverType, Software, Flags, SDKVersion,
        pSwapChainDesc, ppSwapChain, ppDevice);
 
    //Save the device and swap chain interface.
    //These aren't used in this example but are generally nice to have addresses to
    if(ppSwapChain == NULL)
    {
        printf("Swap chain is NULL.\n");
        return hResult;
    }
    else
    {
        pSwapChain = *ppSwapChain;
    }
    if(ppDevice == NULL)
    { 
        printf("Device is NULL.\n");
        return hResult;
    }
    else
    {
        pDevice = *ppDevice;
    }
 
    //Get the vtable address of the swap chain's Present function and modify it with our own.
    //Save it to return to later in our Present hook
    if(pSwapChain != NULL)
    {
        DWORD_PTR *SwapChainVTable = (DWORD_PTR *)pSwapChain;
        SwapChainVTable = (DWORD_PTR *)SwapChainVTable[0];
        printf("Swap chain VTable: %X\n", SwapChainVTable);
        PresentAddress = (pPresent)SwapChainVTable[8];
        printf("Present address: %X\n", PresentAddress);
 
        DWORD OldProtections = 0;
        VirtualProtect(&SwapChainVTable[8], sizeof(DWORD_PTR), PAGE_EXECUTE_READWRITE, &OldProtections);
        SwapChainVTable[8] = (DWORD_PTR)PresentHook;
        VirtualProtect(&SwapChainVTable[8], sizeof(DWORD_PTR), OldProtections, &OldProtections);
    }
 
    //Create the font that we will be drawing with
    CreateDrawingFont();
 
    return hResult;
}

CreateDrawingFont simply sets up a ID3DX10Font to draw with. Now since the VMT entry was replaced, PresentHook will be invoked instead of Present. Here is where the drawing can be done.

HRESULT WINAPI PresentHook(IDXGISwapChain *thisAddr, UINT SyncInterval, UINT Flags)
{
 
    //printf("In Present (%X)\n", PresentAddress);
 
    RECT Rect = { 100, 100, 200, 200 };
    pFont->DrawTextW(NULL, L"Hello, World!", -1, &Rect, DT_CENTER | DT_NOCLIP, RED);
    return PresentAddress(thisAddr, SyncInterval, Flags);
}

I chose a different calling convention here than for the earlier example code, but everything still functions the same. The end result shows the Present hook successfully rendering the text:screen2
A few important caveats about doing it this way:

  • The hook must be installed prior to the call to D3D10CreateDeviceAndSwapChain. Otherwise handles to the device and swap chain won’t be obtained.
  • ID3DX10Font::DrawText can mess with the blend states, shaders, rasterizer, etc. Overlaying text on an application that makes use of these requires the hook developer to account for this and save/restore the states properly.

The source code for the VMT hook example, the slightly modified Direct3D10 sample application, and the Direct3D10 hook can be found here. The hook uses Microsoft Detours as a dependency to perform the initial hooking of D3D10CreateDeviceAndSwapChain.

December 20, 2014

Writing a Primitive Debugger: Part 5 (Miscellaneous)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 2:43 PM

Welcome to the final installment of how to write a primitive debugger. This post will cover some miscellaneous topics that were not present in the previous articles in order to add some missing core functionality. The topics covered here will be how to display a disassembly listing , how to step over code, i.e. step past a conditional branch, and how to dump and modify arbitrary memory of a process.

Disassembly

In order to display a disassembly dump on x86 and x64, this debugger will take advantage of the BeaEngine disassembly library. This is a very handy library that supports the 16/32/64-bit Intel instruction sets as well as floating point and vector extensions. The project is open source for those interested in looking at the internals of the disassembler. In the example code, it is distributed as DLLs that the code will load and be used at runtime. This is done as a convenience in order to prevent having to possibly recompile static libraries.

The disassembler code will be pretty straightforward to work with. BeaEngine has a DISASM structure that needs to be initialized with the architecture type and an address. This is then passed along to a Disasm function, which fills the structure with information about the instruction at the address. Since the disassembler is dynamically loaded, and is used for x86/x64 in the same code, the function pointer to Disasm needs to be retrieved. All of this initialization code can be handled in the constructor.

Disassembler::Disassembler(HANDLE hProcess) : m_hProcess{ hProcess }
{
    memset(&m_disassembler, 0, sizeof(DISASM));
#ifdef _M_IX86
    m_disassembler.Archi = 0;
    if (m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x86.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "_Disasm@4");
    }
#elif defined _M_AMD64
    m_disassembler.Archi = 64;
    if(m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x64.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "Disasm");
    }
#else
#error "Unsupported architecture"
#endif
}

with m_hDll and m_pDisasm being static, since there’s no need to retrieve these per instance. Since the code is meant to work on x86/x64, there are two separate versions of the DLL provided — one for use in x86 applications, the other for x64.

Now that the disassembly engine is loaded and initialized, it is time to actually begin disassembling code. There is an interesting problem that comes up, however. The debugger is attached to another process, but the disassembler is given an address in the current address space to disassemble at, i.e. the user can request disassembly at address 0x00411000 when prompted. The disassembly at address 0x00411000 in the debugger doesn’t have any relation to the disassembly at address 0x00411000 in the target, due to how virtual memory works. So the solution isn’t as easy as setting the target address to disassemble at to 0x00411000 and calling Disasm.

Instead, the memory at 0x00411000 in the target process must be read and that must be disassembled. Something like this was already done when implementing Interrupt Breakpoints; the original byte at the address was saved before replacing it with an 0xCC opcode. For this, it is still as simple as calling ReadProcessMemory and storing the buffer.

const bool Disassembler::TransferBytes(const DWORD_PTR dwAddress)
{
    SIZE_T ulBytesRead = 0;
    bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)dwAddress, m_bytes.data(), m_bytes.size(), &ulBytesRead));
    if (bSuccess && ulBytesRead == m_bytes.size())
    {
        return true;
    }
    else
    {
        fprintf(stderr, "Could not read from %p. Error = %X\n", dwAddress, GetLastError());
    }
 
    return false;
}

Once that is done, the disassembly process is no more difficult than the BeaEngine example. The target disassembly address is set and the Disasm function is called through the function pointer retrieved from the DLL. This function fills the DISASM structure (m_disassembler in the code), and returns the length of the instruction. This can be added to the previous address to get the address of the next instruction, and the process repeats.

const bool Disassembler::BytesAtAddress(DWORD_PTR dwAddress, size_t ulInstructionsToDisassemble /*= 15*/)
{
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        bool bFailed = false;
        while (!bFailed && ulInstructionsToDisassemble-- > 0)
        {
            int iDisasmLength = m_pDisasm(&m_disassembler);
            if (iDisasmLength != UNKNOWN_OPCODE)
            {
                fprintf(stderr, "0x%p - %s\n", dwAddress, m_disassembler.CompleteInstr);
                m_disassembler.EIP += iDisasmLength;
                dwAddress += iDisasmLength;
            }
            else
            {
                fprintf(stderr, "Error: Reached unknown opcode in disassembly.\n");
                bFailed = true;
            }
        }
    }
    else
    {
        fprintf(stderr, "Could not show disassembly at address. Disassembler Dll was not loaded properly.\n");
        return false;
    }
 
    return true;
}

The SetDisassembler function is responsible for setting the correct starting address in the debuggers local copy of the target processes memory at the desired address. The debugger keeps a 4096 byte cache (the default Windows page size) and uses that if the target to disassemble exists within that range. Otherwise, a read is performed again and the cache re-initialized

void Disassembler::SetDisassembler(const DWORD_PTR dwAddress)
{
    bool bIsCached = ((dwAddress - m_dwStartAddress) < m_bytes.size());
    bIsCached &= (dwAddress < m_dwStartAddress);
    if (!bIsCached)
    {
        (void)TransferBytes(dwAddress);
        m_disassembler.EIP = (UIntPtr)m_bytes.data();
        m_dwStartAddress = dwAddress;
    }
    else
    {
        m_disassembler.EIP = (UIntPtr)&amp;m_bytes.data()[dwAddress - m_dwStartAddress];
    }
}

And that’s all it takes. The debugger can now print a disassembly listing at any readable address.

Step Over

Step into is the ability to step one instruction at a time as it executes and is something that is supported at the hardware level with the single step flag. Step over is implemented purely in code and is a convenience function that lets the user skip stepping into branches in the code. For example, take the following disassembly listing:

0040108D 81 C4 C0 00 00 00    add esp,    0C0h
00401093 3B EC                cmp         ebp,esp  
00401095 E8 76 03 00 00       call        SomeFunction (0401410h)  
0040109A 8B E5                mov         esp,ebp  
...

Assume that you are at a broken state at address 0x0040108D. You know that SomeFunction is not of any interest to you and you don’t want to single step through it. You’d rather get to the more interesting parts at address 0x0040109A and below. So what you do is when you’re at 0x00401093, you set a breakpoint at 0x0040109A and continue execution. This effectively skips the CALL instruction at 0x00401095 and hits your breakpoint at the instruction immediately following it, so you can continue debugging. Step over effectively wraps these steps in to one convenient function provided by a debugger.

In order to perform a step over, the debugger must know what the next instruction is. This is obviously needed because it is the instruction that the user wishes to break at next. The next instruction can be one of a few types:

  1. Invalid
  2. A non-branching instruction (i.e. add/mov/lea/push/…)
  3. A conditional branching instruction (i.e. jz/jge/jb/…)
  4. A non-conditional branching instruction (i.e. call/jmp/ret)

If it’s an invalid instruction, then it’s up to the debugger implementation to decide what to do next. In the second case, the next instruction is simply the address of the current one plus the length of the current instruction. The third case is interesting and is also partially implementation defined. If the user is broken on a conditional branch and wishes to step over, how should that be treated? For example, assume the user is looking at the following disassembly listing and is broken on 0x00401219:

00401213 8B 45 F8             mov         eax,dword ptr [a]  
00401216 3B 45 EC             cmp         eax,dword ptr [b]  
00401219 7E 05                jle         test+60h (0401220h)  
0040121B E8 50 FF FF FF       call        d (0401170h)  
00401220 8B F4                mov         esi,esp  

Assume [a] is greater than [b], so the jump will not be taken and the next instruction will be 0x0040121B. The user decides that they want to step over, so they will land at 0x0040121B, which is correct. Now assume the opposite: that [a] is less than or equal to [b]. This means that the branch will be taken and the next address will be 0x00401220. If the user is at 0x00401219 and decides to step over, then what happens? Since 0x0040121B will not be reached, that step over point isn’t necessary valid. Should execution continue because the step over will not be reached, or should the debugger “fix” it for the user and break at 0x00401220? Different debuggers do different things here. I would personally go with the latter case just to be safe. Especially since the debugger has access to the EFLAGS register and can tell whether the branch will be taken or not prior to execution of the instruction. This particular scenario is left undefined in the example code.

The last scenario is that of an unconditional branch. The two unconditional branches that affect implementing step over are JMP (unconditional jump) and RET (return). Under both of these, the point of execution is guaranteed to change: either to the jump destination or to the return address on the stack. Stepping over a RET instruction is pretty useless, because it won’t be hit. Likewise, stepping over a JMP instruction, in 95% of cases, will also be useless. The point of return from that JMP will most likely not be the instruction following it. For these cases, the example code converts the step over into a step into and follows execution. Having said all of this, the next instruction retrieval function is implemented as follows:

DWORD_PTR Disassembler::GetNextInstruction(const DWORD_PTR dwAddress, bool &bIsUnconditionalBranch)
{
    DWORD_PTR dwNextAddress = 0;
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        int iDisasmLength = m_pDisasm(&m_disassembler);
        if (iDisasmLength != UNKNOWN_OPCODE)
        {
            if (m_disassembler.Instruction.BranchType == RetType || m_disassembler.Instruction.BranchType == JmpType)
            {
                bIsUnconditionalBranch = true;
            }
            else
            {
                dwNextAddress = (dwAddress + iDisasmLength);
            }
        }
        else
        {
            fprintf(stderr, "Could not get next instruction. Unknown opcode at %p.\n");
        }
    }
    else
    {
        fprintf(stderr, "Could not get next instruction. Disassembler Dll was not loaded propertly.\n");
    }
 
    return dwNextAddress;
}

with the full StepOver function being implemented as follows:

const bool Debugger::StepOver()
{
    CONTEXT ctx = GetExecutingContext();
    bool bIsUnconditionalBranch = false;
#ifdef _M_IX86
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Eip, bIsUnconditionalBranch);
#elif defined _M_AMD64
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Rip, bIsUnconditionalBranch);
#else
#error "Unsupported platform"
#endif
    if (bIsUnconditionalBranch)
    {
        return StepInto();
    }
    else if (dwStepOverAddress != 0)
    {
        m_pStepPoint->Disable();
        m_pStepPoint->ChangeAddress(dwStepOverAddress);
        (void)m_pStepPoint->Enable();
 
        ctx.EFlags &= ~0x100;
        (void)SetExecutingContext(ctx);
 
        return Continue(true);
    }
 
    return false;
}

with m_pStepPoint being a breakpoint to the step over address.

Dump and modify memory

This last piece of functionality is nothing more than an exercise in calling ReadProcessMemory and WriteProcessMemory.

const bool Debugger::PrintBytesAt(const DWORD_PTR dwAddress, size_t ulNumBytes /*= 40*/)
{
    SIZE_T ulBytesRead = 0;
    std::unique_ptr<unsigned char[]> pBuffer = std::unique_ptr<unsigned char[]>(new unsigned char[ulNumBytes]);
    const bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess(), (LPCVOID)dwAddress, pBuffer.get(), ulNumBytes, &ulBytesRead));
    if (bSuccess && ulBytesRead == ulNumBytes)
    {
        for (unsigned int i = 0; i < ulBytesRead; ++i)
        {
            fprintf(stderr, "%02X ", pBuffer.get()[i]);
        }
        fprintf(stderr, "\n");
        return true;
    }
 
    fprintf(stderr, "Could not read memory at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}
 
const bool Debugger::ChangeByteAt(const DWORD_PTR dwAddress, const unsigned char cNewByte)
{
    SIZE_T ulBytesWritten = 0;
    const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess(), (LPVOID)dwAddress, &cNewByte, sizeof(unsigned char), &ulBytesWritten));
    if (bSuccess && ulBytesWritten == sizeof(unsigned char))
    {
        return true;
    }
 
    fprintf(stderr, "Could not change byte at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}

Testing the functionality

The same example program as in the previous posts will be used, with minor modifications:

#include 
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    int i = 0x1234;
    printf("c called.\n");
    printf("i is at %p with value %X.\n", &i, i);
    d();
    printf("i is at %p with value %X.\n", &i, i);
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

To test memory modification, the i variable can be modified while the program is in a broken state in the d function. Entered commands are in red.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401170.
Press c to continue, s to step into, o to step over.
i
Enter address to print bytes at: 0x18fcac
34 12 00 00 CC CC CC CC 0C AD C2 AA 8C FD 18 00 8A 10 40 00 60 FE 18 00 94 FD 18
 00 00 E0 FD 7F CC CC CC CC CC CC CC CC
e
Enter address to change byte at: 0x18fcac
Enter new byte: 0x12
e
Enter address to change byte at: 0x18fcad
Enter new byte: 0x34
c
Received step at address 00401171

Output from the target application:

Addresses:
a: 00401000
b: 00401050
c: 004010A0
d: 00401170

a called.
b called.
c called.
i is at 0018FCAC with value 1234.
d called.
i is at 0018FCAC with value 3412.

Disassembly and step over are pretty straightforward to test when lined up with the Visual Studio debugger. For example, below is the disassembly relevant to the a function:

//printf("a called.\n");
00401009 68 48 21 40 00       push        402148h  
0040100E FF 15 94 20 40 00    call        dword ptr ds:[402094h]  
00401014 83 C4 04             add         esp,4  
//b();
00401017 E8 14 00 00 00       call        b (0401030h)  
0040101C 5F                   pop         edi  
}
...

Setting a breakpoint on 0x00401009 and stepping over shows the following behavior in the debugger:

a
[A]ddress or [s]ymbol name? a
Breakpoint address: 0x401009
Received breakpoint at address 00401009.
Press c to continue, s to step into, o to step over.
o
Could not write back original opcode to address 00000000. Error = 1E7
Received breakpoint at address 0040100E.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401014.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401017.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 0040101C.
Press c to continue, s to step into, o to step over.

Lastly, a disassembly listing for all of this can be displayed:

d
Enter address to print disassembly at: 0x401009
0x00401009 - push 00402148h
0x0040100E - call dword ptr [00402094h]
0x00401014 - add esp, 04h
0x00401017 - call 0067D3A3h
0x0040101C - pop edi
0x0040101D - pop esi
0x0040101E - pop ebx
0x0040101F - mov esp, ebp
0x00401021 - pop ebp
0x00401022 - ret
0x00401023 - int3
0x00401024 - int3
0x00401025 - int3
0x00401026 - int3
0x00401027 - int3

which lines up with what Visual Studio gives.

Wrap up

Writing a debugger may seem like a daunting task, but it is certainly attainable. Aside from the disassembly engine — which can be a whole long series of posts in itself — everything was written from scratch in about 2,000 lines of code (doing a ‘\n’ regex search on the solution yields 2195 lines). Contained within those lines of code is the ability to

  • Add/Remove breakpoints
  • Step into / Step over instructions
  • Continue execution at a breakpoint or step
  • Print / Modify registers
  • Print a call stack
  • Match symbols to addresses / Dump symbols for a module
  • Print / Modify memory
  • Disassemble at an address

While it’s certainly not WinDbg or the Visual Studio debugger, it is an impressive amount for relatively little work. Hopefully those following these series of posts have gained a bit on insight into how the tools that they may use on a frequent basis work and what it takes to develop them. Thanks for reading.

Article Roadmap

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

December 11, 2014

Writing a Primitive Debugger: Part 4 (Symbols)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 7:30 PM

Up to now, we have developed a debugger that can attach and detach from a process, set and remove breakpoints, print registers and a call stack, and modify control flow by changing the executing thread context. These are all pretty essential features of a debugger. The topic of  this post, debug symbols, is more of a “nice-to-have”. An application may or may not ship with debug symbols, but in the event that it does, i.e. it’s your own application, then the process of debugging becomes significantly more simple.

Debug Symbols

At its simplest definition, a debug symbol is a piece of information that shows how specific parts of a compiled program map back to the source level. For example, a debug symbol might tell information about the name of a variable at a memory address, or which line of code, and in which file, a series of assembly instructions map to. They are typically generated during debug builds and are used to provide some clarity to a developer that is debugging (or reverse engineering) a piece of code. There is no universal debug symbol format for a language, and they may vary between compilers. On the modern Windows platform, debug symbols come in the form of Program Database (PDB) files, ending with a .pdb extension.

These files hold a lot of useful information about the compiled executable or DLL. As mentioned above, they can contain information regarding which source file and line number (or which object file) a symbol at a certain address maps to. They can contain the names and types of global, static, and local variables, as well as classes and structs. They can also contain information compiler optimizations that were used when compiling the code. Some of these things may not be present if the code was compiled with stripped symbols. During a debugging session, the debugger will initialize a symbol handler and begin looking for, either recursively in common directories and/or user-specified directories, and parsing* matching PDB files. When a user is debugging, symbol information can be retrieved and names and source line numbers can be displayed to them (if available).
* This is a useful open source parser that can parse the proprietary format of PDB files.

Implementation

Microsoft provides a very rich set of APIs for handling symbols through the DbgHelp API. There are functions to load/enumerate symbols for a module, find a symbol by name or address, enumerate source file and line references found in PDBs, dynamically add or remove entries from the symbol table, interact with symbol stores, and much more. Given the very large API, I’ve only chosen to demonstrate implementation of the more common features. One thing to consider is that all functions in the DbgHelp API set are single threaded. The example code is single threaded, but does not have concurrency synchronization to ensure that it is only called from a single thread, meaning if you’re implementing something off of this code, make sure that you add concurrency synchronization.

Initializing a symbol handler is pretty straightforward: it merely involves calling SymInitialize. The function takes a process handle, which is opened by the debugger when it attaches. There is also a parameter for the user search path to locate PDB files, and a third parameter to specify whether the debugger is to enumerate all of the loaded modules in the process and load their symbols as well. For an attaching debugger, specifying that this behavior is dependent on the situation. There is a case, such as the debugger creating the target process to debug, or with delay-loaded DLLs, that can cause some symbols to not be loaded. Additionally, if this third parameter is set to true and the symbol handler is initialized prior to receiving all of the LOAD_DLL_DEBUG_EVENT events, then some symbols may not be loaded. The implementation sample code has been defaulted to false, and symbols for modules will be loaded in the CREATE_PROCESS_DEBUG_EVENT and LOAD_DLL_DEBUG_EVENT event handlers. This ensures that all symbol files for every module will be properly loaded.

Prior to initializing the symbol handler, the SymSetOptions function should be called, which configures how and what information the symbol handler will load. Simply put into code, the initialization routine looks like the following:

Symbols::Symbols(const HANDLE hProcess, const HANDLE hFile, const bool bLoadAll /*= false*/)
    : m_hProcess{ hProcess }, m_hFile{ hFile }
{
    (void)SymSetOptions(SYMOPT_CASE_INSENSITIVE | SYMOPT_DEFERRED_LOADS |
        SYMOPT_LOAD_LINES | SYMOPT_UNDNAME);
 
    const bool bSuccess = BOOLIFY(SymInitialize(hProcess, nullptr, bLoadAll));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not initialize symbol handler. Error = %X.\n",
            GetLastError());
    }
}

The options here specify that symbol searches will be case insensitive, that symbols won’t be loaded until a reference is made (not to be confused with delay-loading  for DLLs that were mentioned above), that line information will be loaded, and that symbols will be displayed in an undecorated form. Case insensitivity and undecorated names are there for convenience; it would be annoying to search for exact symbol names such as “?f@@YAHD@Z” otherwise.

When the symbol handler is finished, i.e. the debugger is detaching from the process, a simple call to SymCleanup will terminate the symbol handler:

Symbols::~Symbols()
{
    const bool bSuccess = BOOLIFY(SymCleanup(m_hProcess));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not terminate symbol handler. Error = %X.\n",
            GetLastError());
    }
}

That sets up the initialization and termination of the symbol handler. Time for everything in between.

Enumerating Symbols

One useful feature of a debugger might be to internally enumerate all symbols of a module. This can allow for storage and fast lookup at a later time. Or it can allow for a graphic display for the user and easy navigation to the symbol address from its name. Enumerating symbols is a two step process: first SymLoadModuleEx is called to load the symbol table for the module, then SymEnumSymbols can be called with the base address of the module. SymEnumSymbols takes a callback of type PSYM_ENUMERATESYMBOLS_CALLBACK as a parameter. This callback will be called for every symbol found in the modules symbol table and will have a SYMBOL_INFO structure that shows information about the symbol, such as its name, address, whether it is a register, what value it holds if its a constant, etc. Put in to code, this is rather straightforward:

const bool Symbols::EnumerateModuleSymbols(const char * const pModulePath, const DWORD64 dwBaseAddress)
{
    DWORD64 dwBaseOfDll = SymLoadModuleEx(m_hProcess, m_hFile, pModulePath, nullptr,
        dwBaseAddress, 0, nullptr, 0);
    if (dwBaseOfDll == 0)
    {
        fprintf(stderr, "Could not load modules for %s. Error = %X.\n",
            pModulePath, GetLastError());
        return false;
    }
 
    UserContext userContext = { this, pModulePath };
    const bool bSuccess = 
       BOOLIFY(SymEnumSymbols(m_hProcess, dwBaseOfDll, "*!*", SymEnumCallback, &userContext));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not enumerate symbols for %s. Error = %X.\n",
            pModulePath, GetLastError());
    }
 
    return bSuccess;
}

Resolving Symbols

There are several ways to resolve symbols, but the two most common are by name and by address. This can be achieved by calling SymFromName and SymFromAddr respectively. Both of these populate a SYMBOL_INFO structure, just as calling SymEnumSymbols does. Invoking them is also rather straightforward:

const bool Symbols::SymbolFromAddress(const DWORD64 dwAddress, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    DWORD64 dwDisplacement = 0;
    const bool bSuccess = BOOLIFY(SymFromAddr(m_hProcess, dwAddress, &dwDisplacement, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol from address %p. Error = %X.\n",
            (DWORD_PTR)dwAddress, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found at %p. Name: %.*s. Base address of module: %p\n",
        (DWORD_PTR)dwAddress, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByName(pSymInfo->Name);
 
    return bSuccess;
}
 
const bool Symbols::SymbolFromName(const char * const pName, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)
        + sizeof(ULONG64) - 1 / sizeof(ULONG64)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    const bool bSuccess = BOOLIFY(SymFromName(m_hProcess, pName, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol for name %s. Error = %X.\n",
            pName, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found for %s. Name: %.*s. Address: %p. Base address of module: %p\n",
        pName, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->Address,
        (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByAddress((DWORD_PTR)pSymInfo->Address);
 
    return bSuccess;
}

with the SymbolInfo structure being an extended structure that holds information about source files and line numbers (see example code).

Testing the functionality

To test this functionality, we can take the sample program from the previous post (reproduced below) and see the difference in how call stacks look. The new functionality in this version has added the ability to resolve symbols for the addresses in the callstack. Also, the debugger was augmented to add two new abilities: to dump all symbols from a module, and to set/remove breakpoints on a symbol by name.

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

Setting a breakpoint on the d function and printing the call stacks shows the more useful functionality between the previous version of the debugger and this one. Entered commands are shown in red, while new symbol information is shown in orange.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401090.
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 00401090
Stack address: 00000000
Frame address: 0018FDE8
Symbol name: d
Symbol address: 00401090
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 4
Frame: 1
Execution address: 0040107C
Stack address: 00000000
Frame address: 0018FDEC
Symbol found at 0040107C. Name: c. Base address of module: 00400000
Symbol name: c
Symbol address: 00401060
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 9
Frame: 2
Execution address: 0040104C
Stack address: 00000000
Frame address: 0018FE40
Symbol found at 0040104C. Name: b. Base address of module: 00400000
Symbol name: b
Symbol address: 00401030
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 15
Frame: 3
Execution address: 0040101C
Stack address: 00000000
Frame address: 0018FE94
Symbol found at 0040101C. Name: a. Base address of module: 00400000
Symbol name: a
Symbol address: 00401000
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 21
Frame: 4
Execution address: 004010EF
Stack address: 00000000
Frame address: 0018FEE8
Symbol found at 004010EF. Name: main. Base address of module: 00400000
Symbol name: main
Symbol address: 004010B0
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 27
Frame: 5
Execution address: 004013A9
Stack address: 00000000
Frame address: 0018FF3C
Symbol found at 004013A9. Name: __tmainCRTStartup. Base address of module: 00400000
Symbol name: __tmainCRTStartup
Symbol address: 00401210
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 473
Frame: 6
Execution address: 004014ED
Stack address: 00000000
Frame address: 0018FF8C
Symbol found at 004014ED. Name: mainCRTStartup. Base address of module: 00400000

Symbol name: mainCRTStartup
Symbol address: 004014E0
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 456
Frame: 7
Execution address: 76AE919F
Stack address: 00000000
Frame address: 0018FF94
Symbol found at 76AE919F. Name: BaseThreadInitThunk. Base address of module: 00000000
Symbol name: BaseThreadInitThunk
Symbol address: 76AE9191
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 8
Execution address: 77430BBB
Stack address: 00000000
Frame address: 0018FFA0
Symbol found at 77430BBB. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 9
Execution address: 77430B91
Stack address: 00000000
Frame address: 0018FFE4
Symbol found at 77430B91. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
StackWalk64 finished.

This looks much more useful compared to just getting absolute addresses as in the previous version. Here, for some symbols, the source files can be found on the host machine and be presented to the user alongside the raw assembly. Additionally, symbols  can be printed for any module as shown below:

y
Enter in module name to dump symbols for: kernel32.dll
Symbol name: QuirkIsEnabledWorker
Symbol address: 76AE0010
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: EnumCalendarInfoExEx
Symbol address: 76AE03BD
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: GetFileMUIPath
Symbol address: 76AE03CE
Address displacement: 0
Source file: (null)
Line number: 0
...

That concludes the topic on symbols. The implementation presented here only scratched the surface of what is available in terms of the DbgHelp API, and I recommend that those interested further explore the MSDN documentation on the topics. The next article will conclude the series with a collection of miscellaneous features that debuggers typically possess. For that piece, it will probably include the ability to step over code (step into is currently implemented), present a disassembly listing to the user for x86 and x64, and allow for modification of arbitrary memory, instead of just registers and/or a thread context.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

December 5, 2014

Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 6:47 PM

Up to now, all of the functionality discussed in writing a debugger has been related to getting a debugger attached to a process and being able to set breakpoints and perform single stepping. While certainly useful, this functionality is more passive debugging: you can break the state of the process at a certain point and instrument it at the instruction level, but you cannot actually modify any behavior, or even view how the process got to that state. The next core functionality that will be covered will detail actually being able to view and change program execution state (in the form of the thread context, namely registers), and being able to view the thread’s call stack upon hitting a breakpoint.

Thread Contexts

A thread context, as defined relevant to Windows, “includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread’s process.” For a usermode debugger, which is what is being developed in these posts, the important parts are the machine registers and the user stack. The thread environment block is also accessible from user-mode but won’t be covered here due to its undocumented and very specific nature. When a process starts up, the loader will set up the processes main thread and begin execution at the entry point. This main thread can in turn launch additional threads, which themselves launch threads, and so on. Each of these threads will have their own context containing the items listed above.

The purpose of these contexts is that Windows, being a preemptive multitasking operating system, can have any [usermode] task, such as a thread executing code, interrupted at any point in time. During these interruptions, a context switch will be carried out, which is simply the process of saving the current execution context and setting the new one to execute. Eventually, when the original task is scheduled to resume, a context switch will again occur back to the context of the original thread and it will continue executing as if nothing had happened. What do these contexts look like? The answer is that it is entirely processor-specific, which shouldn’t be too surprising given that they store registers.

In Windows, the part of the thread context that is available to developers comes defined as a CONTEXT structure in winnt.h. For example, below is a snippet from a CONTEXT structure for x86 processors.

typedef struct _CONTEXT {
    DWORD   Dr0;
    DWORD   Dr1;
    DWORD   Dr2;
    ...
    FLOATING_SAVE_AREA FloatSave;
    DWORD   SegGs;
    DWORD   SegFs;
    ...
    DWORD   Edi;
    DWORD   Esi;
    DWORD   Ebx;
    ...
    DWORD   Ebp;
    DWORD   Eip;
    ...

The x64 version looks pretty closely related, with register widths being extended to 64-bits as well as additional registers and extensions added.

typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
    DWORD ContextFlags;
    DWORD MxCsr;
    ...
    WORD   SegGs;
    WORD   SegSs;
    DWORD EFlags;
    ...
    DWORD64 Rax;
    DWORD64 Rcx;
    DWORD64 Rdx;
    ...
    DWORD64 R13;
    DWORD64 R14;
    DWORD64 R15;
    ...

This is the structure that will be the most useful to inspect and modify when debugging. A debugger should be able to print out this structure and allow for modification of any of its fields. Fortunately, there are two very useful APIs for retrieving and modifying this structure: GetThreadContext and SetThreadContext. These have been covered previously when discussing how to enable single-stepping. The context had to be retrieved and the EFlags registered modified. So what modifications are needed to the existing code/logic in order to add this functionality? It’s as simple as opening a handle to the current executing (or in the debuggers case, broken) thread and retrieving/setting the context.

const CONTEXT Debugger::GetExecutingContext()
{
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_ALL;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bool bSuccess = BOOLIFY(GetThreadContext(hThread(), &ctx));
        if (!bSuccess)
        {
            fprintf(stderr, "Could not get context for thread %X. Error = %X\n", m_dwExecutingThreadId, GetLastError());
        }
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return ctx;
}
 
const bool Debugger::SetExecutingContext(const CONTEXT &ctx)
{
    bool bSuccess = false;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bSuccess = BOOLIFY(SetThreadContext(hThread(), &ctx));
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return bSuccess;
}

For each access or modification, there is a handle opened (and closed) to the current thread — this certainly isn’t the most efficient approach, but serves well enough for demo purposes.  The state of the context is then stored in m_lastContext. These functions are invoked when the process receives an EXCEPTION_BREAKPOINT and when single stepping the process, i.e. handling the EXCEPTION_SINGLE_STEP exception. Therefore, m_lastContext will always have the appropriate register values in the context structure when a breakpoint is hit or when the user is single stepping. These functions can also be invoked when the user wants to modify a certain register or registers through the debugger interface.  Printing the context involves nothing more than printing out the values in the structure. I’ve chosen to only print out the more commonly used registers for the example code:

void Debugger::PrintContext()
{
#ifdef _M_IX86
    fprintf(stderr, "EAX: %p EBX: %p ECX: %p EDX: %p\n"
        "ESP: %p EBP: %p ESI: %p EDI: %p\n"
        "EIP: %p FLAGS: %X\n",
        m_lastContext.Eax, m_lastContext.Ebx, m_lastContext.Ecx, m_lastContext.Edx,
        m_lastContext.Esp, m_lastContext.Ebp, m_lastContext.Esi, m_lastContext.Edi,
        m_lastContext.Eip, m_lastContext.EFlags);
#elif defined _M_AMD64
    fprintf(stderr, "RAX: %p RBX: %p RCX: %p RDX: %p\n"
        "RSP: %p RBP: %p RSI: %p RDI: %p\n"
        "R8: %p R9: %p R10: %p R11: %p\n"
        "R12: %p R13: %p R14: %p R15: %p\n"
        "RIP: %p FLAGS: %X\n",
        m_lastContext.Rax, m_lastContext.Rbx, m_lastContext.Rcx, m_lastContext.Rdx,
        m_lastContext.Rsp, m_lastContext.Rbp, m_lastContext.Rsi, m_lastContext.Rdi,
        m_lastContext.R8, m_lastContext.R9, m_lastContext.R10, m_lastContext.R11,
        m_lastContext.R12, m_lastContext.R13, m_lastContext.R14, m_lastContext.R15,
        m_lastContext.Rip, m_lastContext.EFlags);
#else
#error "Unsupported architecture"
#endif
}

Call Stacks

At the lowest level, the scope of a function is defined by its stack frame. This is a compiler and/or ABI defined construct for how the state of the function will be layed out. A stack frame typically includes the return address of the caller, any parameters that were passed to the function from the caller, and space for local variables that exist within the scope of the function. For x86 and x64, among other architectures, these stack frames are preceded with a prologue, which is the code responsible for setting up the stack and frame pointers (ESP/EBP or RSP/RBP) from the caller to the callee. Prior to the callee returning, there is an epilogue, which is responsible for returning the stack and frame pointers to that of the caller. For example, consider the following C function:

void TestFunction(int a, int b, int c)
{
    int d = 4, e = 5, f = 6;
}

which was called in the following way

push        3  
push        2  
push        1  
call        TestFunction

Disassembled as x86, this becomes:


push        ebp  
mov         ebp,esp  
sub         esp,0Ch  
mov         dword ptr [ebp-4],4  
mov         dword ptr [ebp-8],5  
mov         dword ptr [ebp-0Ch],6  
mov         esp,ebp  
pop         ebp  
ret         0Ch  

The prologue and epilogue are highlighted in orange. After the execution of the prologue, the stack frame for this function will contain the callers frame pointer in [EBP], the return address at [EBP+4] (because the CALL instruction implicitly pushes the address of the next instruction on the stack before changing execution), and the passed parameters at [EBP+8], [EBP+12], and [EBP+16]. The prologue subtracted 12 from the base of the stack to make room for local variables — the three 32-bit ints declared within the function. These will be at [EBP-4], [EBP-8], and [EBP-12], as can be see in the disassembly.

This setup is pretty convenient because it offers easy distinction between what is a parameter and what is a local variable. Debugging becomes a bit easier since everything is held on the stack and indexed through the frame pointer, rather than scattered around between registers and the stack. This changes a bit as you go from x86 to x64, where x64 will store the first four (or six, depending on your compiler/platform) arguments in registers, and the rest on the stack. This can also change a bit depending on calling conventions and compiler optimizations, especially frame-pointer omission.

Since the stack frame stores the return address of the caller, it is possible to see where the function was called from. That is what the call stack is: a collection of stack frames that represent the call chain in the code leading up to the current stack frame. This information is very useful to have in terms of debugging, because a bug that presented itself in one function may have manifested earlier on in the code. Being able to quickly traverse frames, and see the values within those frames, is an invaluable aid to debugging.

On the Windows platform, there is a convenient function that performs the tedium/annoyance of walking stack frames backwards: StackWalk64. This function is x86 and x64 compatible, but does require some setup prior to being invoked. Given the very machine-specific layout of stack frames, the StackWalk64 function requires filling out a STACKFRAME64 structure, which will be passed to it as an argument. Filling out this structure merely involves setting the instruction, frame, and stack pointers, along with the address modes, which will be flat addressing for the case of modern Windows on x86 and x64. Once this structure is set up, StackWalk64 can be called in a loop to retrieve the frames. Put into code, it looks like the following:

void Debugger::PrintCallStack()
{
    STACKFRAME64 stackFrame = { 0 };
    const DWORD_PTR dwMaxFrames = 50;
    CONTEXT ctx = GetExecutingContext();
 
    stackFrame.AddrPC.Mode = AddrModeFlat;
    stackFrame.AddrFrame.Mode = AddrModeFlat;
    stackFrame.AddrStack.Mode = AddrModeFlat;
 
#ifdef _M_IX86
    DWORD dwMachineType = IMAGE_FILE_MACHINE_I386;
    stackFrame.AddrPC.Offset = ctx.Eip;
    stackFrame.AddrFrame.Offset = ctx.Ebp;
    stackFrame.AddrStack.Offset = ctx.Esp;
#elif defined _M_AMD64
    DWORD dwMachineType = IMAGE_FILE_MACHINE_AMD64;
    stackFrame.AddrPC.Offset = ctx.Rip;
    stackFrame.AddrFrame.Offset = ctx.Rbp;
    stackFrame.AddrStack.Offset = ctx.Rsp;
#else
#error "Unsupported platform"
#endif
 
    SafeHandle hThread = OpenCurrentThread();
    for (int i = 0; i < dwMaxFrames; ++i)
    {
        const bool bSuccess = BOOLIFY(StackWalk64(dwMachineType, m_hProcess(), hThread(), &stackFrame,
            (dwMachineType == IMAGE_FILE_MACHINE_I386 ? nullptr : &ctx), nullptr,
            SymFunctionTableAccess64, SymGetModuleBase64, nullptr));
        if (!bSuccess || stackFrame.AddrPC.Offset == 0)
        {
            fprintf(stderr, "StackWalk64 finished.\n");
            break;
        }
 
        fprintf(stderr, "Frame: %X\n"
            "Execution address: %p\n"
            "Stack address: %p\n"
            "Frame address: %p\n",
            i, stackFrame.AddrPC.Offset,
            stackFrame.AddrStack.Offset, stackFrame.AddrFrame.Offset);
    }
}

Testing the functionality

To test this functionality we can create another demo app that will be used as the debug target. The simple one below is what I used:

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

I would recommend disabling incremental linking and ASLR (on the executable, not the system) for convenience sake. Below is the stack trace that Visual Studio produces when a breakpoint is set inside the d function and hit.

Demo.exe!d() Line 5	C++
Demo.exe!c() Line 14	C++
Demo.exe!b() Line 20	C++
Demo.exe!a() Line 26	C++
Demo.exe!main(int argc, char * * argv) Line 41	C++
Demo.exe!__tmainCRTStartup() Line 626	C
Demo.exe!mainCRTStartup() Line 466	C
kernel32.dll!@BaseThreadInitThunk@12()	Unknown
ntdll.dll!___RtlUserThreadStart@8()	Unknown
ntdll.dll!__RtlUserThreadStart@8()	Unknown

Attaching with the debugger also yields 10 frames, as listed below:

a
Target address: 0x4010f0
Received breakpoint at address 004010F0
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 004010F0
Stack address: 00000000
Frame address: 0018FBE4
Frame: 1
Execution address: 004010DA
Stack address: 00000000
Frame address: 0018FBE8
Frame: 2
Execution address: 0040108A
Stack address: 00000000
Frame address: 0018FCBC
Frame: 3
Execution address: 0040103A
Stack address: 00000000
Frame address: 0018FD90
Frame: 4
Execution address: 004011C6
Stack address: 00000000
Frame address: 0018FE64
Frame: 5
Execution address: 00401699
Stack address: 00000000
Frame address: 0018FF38
Frame: 6
Execution address: 004017DD
Stack address: 00000000
Frame address: 0018FF88
Frame: 7
Execution address: 75D5338A
Stack address: 00000000
Frame address: 0018FF90
Frame: 8
Execution address: 77339F72
Stack address: 00000000
Frame address: 0018FF9C
Frame: 9
Execution address: 77339F45
Stack address: 00000000
Frame address: 0018FFDC
StackWalk64 finished.

The output is a bit less elegant than the Visual Studio debugger, but it is correct, which is the more important part. It would be nice, however, to put names to some of those addresses. That is where symbol loading and mapping come in, which will be the subject of the next post.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

December 4, 2014

Twitter Time

Filed under: Uncategorized — admin @ 12:13 AM

Because why not… Follow here at your choosing: https://twitter.com/codereversing

« Newer PostsOlder Posts »

Powered by WordPress