RCE Endeavors 😅

April 4, 2015

Hiding Functionality with Exception Handlers (2/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 1:49 PM

This post will cover the second part of hiding functionality with exception handlers. Unlike the technique presented in the previous post, which modified the SEH record for the local thread, the aim here is to modify the SEH record for another thread in order to better hide what is actually going on. By the end of the post, there should be enough information to put together a working application capable of modifying the SEH list of any thread (barring some exceptions) and causing it to raise an exception to execute your code. The sample application will be a DLL that is injected into a process and hijacks one of its threads to perform some task.

What is the purpose of doing all of this if you’re injecting into a process anyway? After all, you can simply spawn your own thread or likely use the one created during the injection (if CreateRemoteThread was used) and just begin executing your code. I’d argue that this technique gives more obscurity to what is happening during static analysis and is something out of the norm. Plus its fun!

The overall code is very similar to what the first part showed, but now there need to be a few steps added in order to get the TIB of another thread. There are usually a few different approaches, of varying complexity and reliability.

  • Do it directly. Suspend the thread and gets its context. Change the instruction pointer to point to your code which changes the SEH list and raises an interrupt and resume. Perform your task and restore the original context in your SEH handler.
  • Do it indirectly. Suspend the thread, queue an asynchronous procedure call (APC) which changes the SEH list and raises an interrupt (with QueueUserAPC), and resume the thread. The thread must be in an alertable state (waiting on something) for this to work, which is typically the case for most threads in a process.
  • Take the middle ground. Suspend the thread and get the address of its FS segment directly using GetThreadSelectorEntry. Change the SEH list from within your thread and queue an APC to raise the interrupt, resume the thread.

The easiest approach is to do it indirectly with an APC. The code is really straightforward and looks like the following:

void InstallExceptionHandler(DWORD dwThreadId)
{
    auto handle = ThreadHandleTable[dwThreadId];
 
    DWORD dwError = SuspendThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not suspend thread. Error = %X.\n",
            GetLastError());
        return;
    }
 
    CONTEXT ctx = { CONTEXT_ALL };
    GetThreadContext(handle, &ctx);
    LDT_ENTRY ldtEntry = { 0 };
 
    GetThreadSelectorEntry(handle, ctx.SegFs, &ldtEntry);
    const DWORD dwFSAddress =
        (ldtEntry.HighWord.Bits.BaseHi << 24) |
        (ldtEntry.HighWord.Bits.BaseMid << 16) |
        (ldtEntry.BaseLow);
 
    fprintf(stderr, "FS segment address of target thread should be: %X.\n",
        dwFSAddress);
 
    dwError = QueueUserAPC(APCProc, handle, 0);
    if (dwError == 0)
    {
        fprintf(stderr, "Could not queue APC to thread. Error = %X.\n",
            GetLastError());
    }
 
    dwError = ResumeThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not resume thread. Error = %X.\n",
            GetLastError());
    }
}

Here the suspend/queue/resume wording is put directly in to code (with extra debug comments). When the thread resumes, APCProc will be invoked. APCProc will be running in the context of the target thread and is responsible for modifying the SEH list to add in a new handler. Because of this, APCProc can obtain the TIB without any extra overhead code to write and the code basically becomes a copy/paste from part one.

void CALLBACK APCProc(ULONG_PTR dwParam)
{
    fprintf(stderr, "APC callback invoked. Raising exception to trigger exception handler.\n");
 
    EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
    fprintf(stderr, "Segment address of target thread: %X.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(STATUS_ACCESS_VIOLATION, 0, 0, nullptr);
}

The handler, NewHandler, being independent of all of this, doesn’t change much either.

EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    if (pExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION)
    {
        MessageBox(0, L"Some hidden functionality can go here.",
            L"Test", 0);
        return ExceptionContinueExecution;
    }
 
    return ExceptionContinueSearch;
}

Below are some screenshots of this at work on a 32-bit Notepad++ instance.

np1
Thread 5504 is chosen here.
np2The MessageBox in the exception handler successfully pops ups. Hitting the “OK” button resumes execution as normal.

The source for the projects (Visual Studio 2013, Update 4) presented in these parts can be found here. Thanks for reading and follow on Twitter for more updates.

April 3, 2015

Hiding Functionality with Exception Handlers (1/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 9:56 AM

This post will cover the topic of hiding functionality by taking advantage of the OS-supported exception handling provided by Windows. Namely, it will cover Structured Exception Handling (SEH), and how it can be utilized to obscure control flow at runtime and how it can make it more difficult to perform static analysis on a binary. Only the relevant parts of SEH will be covered here; the full details can be found on the MSDN page. Due to the differences in exception handling between Windows on x86 and x64, the general technique presented and accompanying code is relevant on x86 only. The code presented here will also discuss how to manually add exception records, without the use of the SetUnhandledExceptionFilter API. This technique as been seen in PE protectors, anti-intrusion bypass systems, and malware. As always, the material presented is for educational and research purposes; don’t do anything dumb/criminal with it.

Structured exception handling is best demonstrated through the use of the Microsoft extensions to C++ exception handling, namely using __try and __except statements. For example, the following code employs use of SEH:

__try
{
    printf("Hello, World!\n");
    int *pNull = nullptr;
    *pNull = 0x0BADC0DE;
}
__except (GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION)
{
    printf("In exception handler.\n");
}

Using SEH, the access violation arising from the null pointer dereference will be caught by the user defined handler; something not possible in standard C++. How this works is that at the beginning of the function, the compiler sets up the exception frame for this code. Viewing the disassembly for the function, it becomes more apparent how this happens.

00B21003 6A FF                push        0FFFFFFFFh  
00B21005 68 18 25 B2 00       push        0B22518h  
00B2100A 68 AC 10 B2 00       push        0B210ACh  
00B2100F 64 A1 00 00 00 00    mov         eax,dword ptr fs:[00000000h]  
00B21015 50                   push        eax  
00B21016 64 89 25 00 00 00 00 mov         dword ptr fs:[0],esp  

This code appears confusing at first, but can be cleared up by reading the crash course explanation page linked above. The code begins by pushing three values onto the stack. The two items in green will be ignored in the explanation and correspond to values in the exception record: the scope table and the try level. There are some obfuscation tricks to manipulating the scope table that can be done, but they won’t be discussed in this post. The full explanation of these fields and their purpose can be found on the crash course page. The next value, 0x0B210AC is an important one. Following this through in a debugger leads to the symbol __except_handler4.

This is the topmost handler of the exception chain and begins dispatching the exception. SEH works in such a way that the topmost exception handler in the chain is called and has a chance to handle the exception. If the exception is not handled, then the next entry in the exception chain is called until the exception is either handled or the final exception handler is called and the program aborts with an unhandled exception.

Afterwards, the value in FS:[0] is moved into the EAX register. FS:[0] contains the base address of a special Windows structure called the Thread Information Block (TIB). Among other things of interest, this structure contains a pointer to the current SEH frame at its base (+0x0). This value is then pushed onto the stack and the stack pointer at ESP is moved into FS:[0]. What is happening here is that an exception record structure is getting constructed on the stack and is being stored at the head of the SEH list. This allows for proper stack unwinding and exception handler call order in the event of an exception. The format of the exception record is documented on the crash course page and can be converted to a structure, with the irrelevant fields omitted, as follows:

typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl*)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
    //Missing fields here:
    //Scope table
    //Try level
    //EBP
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;

Now knowing the layout of these exception records and where to find them in memory, it is rather straightforward to modify the list. The steps are as follows:

  • Get the address of the TIB through the FS segment
  • Get a pointer to the current SEH frame from the TIB
  • Replace the head of the current SEH frame with a custom handler

Put into code, it looks like the following:

#include <cstdio>
#include <Windows.h>
 
typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl *)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;
 
//Base of TIB structure but we only care about exception chain.
EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    printf("Hello, World!\n");
 
    return ExceptionContinueExecution;
}
 
int main(int argc, char *argv[])
{
    fprintf(stderr, "TIB Base (Pointer to current SEH Frame): %p.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    //Actually the pointer to first exception handler
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(0, 0, 0, nullptr);
 
    return 0;
}

Here a new handler, MyTestHandler, is added to the SEH chain. It gets invoked on the RaiseException call and tells the program to continue execution after printing out a string. Looking at the disassembly, there were no exception records generated for the code since it doesn’t use SEH, so the RaiseException call will appear to go to the unhandled exception filter and crash the application. However, the installation of the handler at runtime through the TIB prevents this and actually results in a call to somewhere unexpected. In addition to adding a new handler, it is also possible to replace an existing one.

Replacing entries in the SEH chain works on a per-thread basis. If the SEH list is modified on one thread and another thread raises an exception, the new SEH handler will not be called. Replacing SEH handlers for arbitrary threads and dispatching exceptions to run in their context will be the topic of the next post.

December 20, 2014

Writing a Primitive Debugger: Part 5 (Miscellaneous)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 2:43 PM

Welcome to the final installment of how to write a primitive debugger. This post will cover some miscellaneous topics that were not present in the previous articles in order to add some missing core functionality. The topics covered here will be how to display a disassembly listing , how to step over code, i.e. step past a conditional branch, and how to dump and modify arbitrary memory of a process.

Disassembly

In order to display a disassembly dump on x86 and x64, this debugger will take advantage of the BeaEngine disassembly library. This is a very handy library that supports the 16/32/64-bit Intel instruction sets as well as floating point and vector extensions. The project is open source for those interested in looking at the internals of the disassembler. In the example code, it is distributed as DLLs that the code will load and be used at runtime. This is done as a convenience in order to prevent having to possibly recompile static libraries.

The disassembler code will be pretty straightforward to work with. BeaEngine has a DISASM structure that needs to be initialized with the architecture type and an address. This is then passed along to a Disasm function, which fills the structure with information about the instruction at the address. Since the disassembler is dynamically loaded, and is used for x86/x64 in the same code, the function pointer to Disasm needs to be retrieved. All of this initialization code can be handled in the constructor.

Disassembler::Disassembler(HANDLE hProcess) : m_hProcess{ hProcess }
{
    memset(&m_disassembler, 0, sizeof(DISASM));
#ifdef _M_IX86
    m_disassembler.Archi = 0;
    if (m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x86.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "_Disasm@4");
    }
#elif defined _M_AMD64
    m_disassembler.Archi = 64;
    if(m_hDll == nullptr)
    {
        m_hDll = LoadLibrary(L"BeaEngine_x64.dll");
        m_pDisasm = (pDisasm)GetProcAddress(m_hDll, "Disasm");
    }
#else
#error "Unsupported architecture"
#endif
}

with m_hDll and m_pDisasm being static, since there’s no need to retrieve these per instance. Since the code is meant to work on x86/x64, there are two separate versions of the DLL provided — one for use in x86 applications, the other for x64.

Now that the disassembly engine is loaded and initialized, it is time to actually begin disassembling code. There is an interesting problem that comes up, however. The debugger is attached to another process, but the disassembler is given an address in the current address space to disassemble at, i.e. the user can request disassembly at address 0x00411000 when prompted. The disassembly at address 0x00411000 in the debugger doesn’t have any relation to the disassembly at address 0x00411000 in the target, due to how virtual memory works. So the solution isn’t as easy as setting the target address to disassemble at to 0x00411000 and calling Disasm.

Instead, the memory at 0x00411000 in the target process must be read and that must be disassembled. Something like this was already done when implementing Interrupt Breakpoints; the original byte at the address was saved before replacing it with an 0xCC opcode. For this, it is still as simple as calling ReadProcessMemory and storing the buffer.

const bool Disassembler::TransferBytes(const DWORD_PTR dwAddress)
{
    SIZE_T ulBytesRead = 0;
    bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess, (LPCVOID)dwAddress, m_bytes.data(), m_bytes.size(), &ulBytesRead));
    if (bSuccess && ulBytesRead == m_bytes.size())
    {
        return true;
    }
    else
    {
        fprintf(stderr, "Could not read from %p. Error = %X\n", dwAddress, GetLastError());
    }
 
    return false;
}

Once that is done, the disassembly process is no more difficult than the BeaEngine example. The target disassembly address is set and the Disasm function is called through the function pointer retrieved from the DLL. This function fills the DISASM structure (m_disassembler in the code), and returns the length of the instruction. This can be added to the previous address to get the address of the next instruction, and the process repeats.

const bool Disassembler::BytesAtAddress(DWORD_PTR dwAddress, size_t ulInstructionsToDisassemble /*= 15*/)
{
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        bool bFailed = false;
        while (!bFailed && ulInstructionsToDisassemble-- > 0)
        {
            int iDisasmLength = m_pDisasm(&amp;m_disassembler);
            if (iDisasmLength != UNKNOWN_OPCODE)
            {
                fprintf(stderr, "0x%p - %s\n", dwAddress, m_disassembler.CompleteInstr);
                m_disassembler.EIP += iDisasmLength;
                dwAddress += iDisasmLength;
            }
            else
            {
                fprintf(stderr, "Error: Reached unknown opcode in disassembly.\n");
                bFailed = true;
            }
        }
    }
    else
    {
        fprintf(stderr, "Could not show disassembly at address. Disassembler Dll was not loaded properly.\n");
        return false;
    }
 
    return true;
}

The SetDisassembler function is responsible for setting the correct starting address in the debuggers local copy of the target processes memory at the desired address. The debugger keeps a 4096 byte cache (the default Windows page size) and uses that if the target to disassemble exists within that range. Otherwise, a read is performed again and the cache re-initialized

void Disassembler::SetDisassembler(const DWORD_PTR dwAddress)
{
    bool bIsCached = ((dwAddress - m_dwStartAddress) < m_bytes.size());
    bIsCached &= (dwAddress < m_dwStartAddress);
    if (!bIsCached)
    {
        (void)TransferBytes(dwAddress);
        m_disassembler.EIP = (UIntPtr)m_bytes.data();
        m_dwStartAddress = dwAddress;
    }
    else
    {
        m_disassembler.EIP = (UIntPtr)&amp;m_bytes.data()[dwAddress - m_dwStartAddress];
    }
}

And that’s all it takes. The debugger can now print a disassembly listing at any readable address.

Step Over

Step into is the ability to step one instruction at a time as it executes and is something that is supported at the hardware level with the single step flag. Step over is implemented purely in code and is a convenience function that lets the user skip stepping into branches in the code. For example, take the following disassembly listing:

0040108D 81 C4 C0 00 00 00    add esp,    0C0h
00401093 3B EC                cmp         ebp,esp  
00401095 E8 76 03 00 00       call        SomeFunction (0401410h)  
0040109A 8B E5                mov         esp,ebp  
...

Assume that you are at a broken state at address 0x0040108D. You know that SomeFunction is not of any interest to you and you don’t want to single step through it. You’d rather get to the more interesting parts at address 0x0040109A and below. So what you do is when you’re at 0x00401093, you set a breakpoint at 0x0040109A and continue execution. This effectively skips the CALL instruction at 0x00401095 and hits your breakpoint at the instruction immediately following it, so you can continue debugging. Step over effectively wraps these steps in to one convenient function provided by a debugger.

In order to perform a step over, the debugger must know what the next instruction is. This is obviously needed because it is the instruction that the user wishes to break at next. The next instruction can be one of a few types:

  1. Invalid
  2. A non-branching instruction (i.e. add/mov/lea/push/…)
  3. A conditional branching instruction (i.e. jz/jge/jb/…)
  4. A non-conditional branching instruction (i.e. call/jmp/ret)

If it’s an invalid instruction, then it’s up to the debugger implementation to decide what to do next. In the second case, the next instruction is simply the address of the current one plus the length of the current instruction. The third case is interesting and is also partially implementation defined. If the user is broken on a conditional branch and wishes to step over, how should that be treated? For example, assume the user is looking at the following disassembly listing and is broken on 0x00401219:

00401213 8B 45 F8             mov         eax,dword ptr [a]  
00401216 3B 45 EC             cmp         eax,dword ptr [b]  
00401219 7E 05                jle         test+60h (0401220h)  
0040121B E8 50 FF FF FF       call        d (0401170h)  
00401220 8B F4                mov         esi,esp  

Assume [a] is greater than [b], so the jump will not be taken and the next instruction will be 0x0040121B. The user decides that they want to step over, so they will land at 0x0040121B, which is correct. Now assume the opposite: that [a] is less than or equal to [b]. This means that the branch will be taken and the next address will be 0x00401220. If the user is at 0x00401219 and decides to step over, then what happens? Since 0x0040121B will not be reached, that step over point isn’t necessary valid. Should execution continue because the step over will not be reached, or should the debugger “fix” it for the user and break at 0x00401220? Different debuggers do different things here. I would personally go with the latter case just to be safe. Especially since the debugger has access to the EFLAGS register and can tell whether the branch will be taken or not prior to execution of the instruction. This particular scenario is left undefined in the example code.

The last scenario is that of an unconditional branch. The two unconditional branches that affect implementing step over are JMP (unconditional jump) and RET (return). Under both of these, the point of execution is guaranteed to change: either to the jump destination or to the return address on the stack. Stepping over a RET instruction is pretty useless, because it won’t be hit. Likewise, stepping over a JMP instruction, in 95% of cases, will also be useless. The point of return from that JMP will most likely not be the instruction following it. For these cases, the example code converts the step over into a step into and follows execution. Having said all of this, the next instruction retrieval function is implemented as follows:

DWORD_PTR Disassembler::GetNextInstruction(const DWORD_PTR dwAddress, bool &bIsUnconditionalBranch)
{
    DWORD_PTR dwNextAddress = 0;
    if (IsInitialized())
    {
        SetDisassembler(dwAddress);
        int iDisasmLength = m_pDisasm(&m_disassembler);
        if (iDisasmLength != UNKNOWN_OPCODE)
        {
            if (m_disassembler.Instruction.BranchType == RetType || m_disassembler.Instruction.BranchType == JmpType)
            {
                bIsUnconditionalBranch = true;
            }
            else
            {
                dwNextAddress = (dwAddress + iDisasmLength);
            }
        }
        else
        {
            fprintf(stderr, "Could not get next instruction. Unknown opcode at %p.\n");
        }
    }
    else
    {
        fprintf(stderr, "Could not get next instruction. Disassembler Dll was not loaded propertly.\n");
    }
 
    return dwNextAddress;
}

with the full StepOver function being implemented as follows:

const bool Debugger::StepOver()
{
    CONTEXT ctx = GetExecutingContext();
    bool bIsUnconditionalBranch = false;
#ifdef _M_IX86
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Eip, bIsUnconditionalBranch);
#elif defined _M_AMD64
    DWORD_PTR dwStepOverAddress = m_pDisassembler->GetNextInstruction(ctx.Rip, bIsUnconditionalBranch);
#else
#error "Unsupported platform"
#endif
    if (bIsUnconditionalBranch)
    {
        return StepInto();
    }
    else if (dwStepOverAddress != 0)
    {
        m_pStepPoint->Disable();
        m_pStepPoint->ChangeAddress(dwStepOverAddress);
        (void)m_pStepPoint->Enable();
 
        ctx.EFlags &= ~0x100;
        (void)SetExecutingContext(ctx);
 
        return Continue(true);
    }
 
    return false;
}

with m_pStepPoint being a breakpoint to the step over address.

Dump and modify memory

This last piece of functionality is nothing more than an exercise in calling ReadProcessMemory and WriteProcessMemory.

const bool Debugger::PrintBytesAt(const DWORD_PTR dwAddress, size_t ulNumBytes /*= 40*/)
{
    SIZE_T ulBytesRead = 0;
    std::unique_ptr<unsigned char[]> pBuffer = std::unique_ptr<unsigned char[]>(new unsigned char[ulNumBytes]);
    const bool bSuccess = BOOLIFY(ReadProcessMemory(m_hProcess(), (LPCVOID)dwAddress, pBuffer.get(), ulNumBytes, &ulBytesRead));
    if (bSuccess && ulBytesRead == ulNumBytes)
    {
        for (unsigned int i = 0; i < ulBytesRead; ++i)
        {
            fprintf(stderr, "%02X ", pBuffer.get()[i]);
        }
        fprintf(stderr, "\n");
        return true;
    }
 
    fprintf(stderr, "Could not read memory at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}
 
const bool Debugger::ChangeByteAt(const DWORD_PTR dwAddress, const unsigned char cNewByte)
{
    SIZE_T ulBytesWritten = 0;
    const bool bSuccess = BOOLIFY(WriteProcessMemory(m_hProcess(), (LPVOID)dwAddress, &cNewByte, sizeof(unsigned char), &ulBytesWritten));
    if (bSuccess && ulBytesWritten == sizeof(unsigned char))
    {
        return true;
    }
 
    fprintf(stderr, "Could not change byte at %p. Error = %X\n", dwAddress, GetLastError());
    return false;
}

Testing the functionality

The same example program as in the previous posts will be used, with minor modifications:

#include 
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    int i = 0x1234;
    printf("c called.\n");
    printf("i is at %p with value %X.\n", &i, i);
    d();
    printf("i is at %p with value %X.\n", &i, i);
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

To test memory modification, the i variable can be modified while the program is in a broken state in the d function. Entered commands are in red.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401170.
Press c to continue, s to step into, o to step over.
i
Enter address to print bytes at: 0x18fcac
34 12 00 00 CC CC CC CC 0C AD C2 AA 8C FD 18 00 8A 10 40 00 60 FE 18 00 94 FD 18
 00 00 E0 FD 7F CC CC CC CC CC CC CC CC
e
Enter address to change byte at: 0x18fcac
Enter new byte: 0x12
e
Enter address to change byte at: 0x18fcad
Enter new byte: 0x34
c
Received step at address 00401171

Output from the target application:

Addresses:
a: 00401000
b: 00401050
c: 004010A0
d: 00401170

a called.
b called.
c called.
i is at 0018FCAC with value 1234.
d called.
i is at 0018FCAC with value 3412.

Disassembly and step over are pretty straightforward to test when lined up with the Visual Studio debugger. For example, below is the disassembly relevant to the a function:

//printf("a called.\n");
00401009 68 48 21 40 00       push        402148h  
0040100E FF 15 94 20 40 00    call        dword ptr ds:[402094h]  
00401014 83 C4 04             add         esp,4  
//b();
00401017 E8 14 00 00 00       call        b (0401030h)  
0040101C 5F                   pop         edi  
}
...

Setting a breakpoint on 0x00401009 and stepping over shows the following behavior in the debugger:

a
[A]ddress or [s]ymbol name? a
Breakpoint address: 0x401009
Received breakpoint at address 00401009.
Press c to continue, s to step into, o to step over.
o
Could not write back original opcode to address 00000000. Error = 1E7
Received breakpoint at address 0040100E.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401014.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 00401017.
Press c to continue, s to step into, o to step over.
o
Received breakpoint at address 0040101C.
Press c to continue, s to step into, o to step over.

Lastly, a disassembly listing for all of this can be displayed:

d
Enter address to print disassembly at: 0x401009
0x00401009 - push 00402148h
0x0040100E - call dword ptr [00402094h]
0x00401014 - add esp, 04h
0x00401017 - call 0067D3A3h
0x0040101C - pop edi
0x0040101D - pop esi
0x0040101E - pop ebx
0x0040101F - mov esp, ebp
0x00401021 - pop ebp
0x00401022 - ret
0x00401023 - int3
0x00401024 - int3
0x00401025 - int3
0x00401026 - int3
0x00401027 - int3

which lines up with what Visual Studio gives.

Wrap up

Writing a debugger may seem like a daunting task, but it is certainly attainable. Aside from the disassembly engine — which can be a whole long series of posts in itself — everything was written from scratch in about 2,000 lines of code (doing a ‘\n’ regex search on the solution yields 2195 lines). Contained within those lines of code is the ability to

  • Add/Remove breakpoints
  • Step into / Step over instructions
  • Continue execution at a breakpoint or step
  • Print / Modify registers
  • Print a call stack
  • Match symbols to addresses / Dump symbols for a module
  • Print / Modify memory
  • Disassemble at an address

While it’s certainly not WinDbg or the Visual Studio debugger, it is an impressive amount for relatively little work. Hopefully those following these series of posts have gained a bit on insight into how the tools that they may use on a frequent basis work and what it takes to develop them. Thanks for reading.

Article Roadmap

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

December 11, 2014

Writing a Primitive Debugger: Part 4 (Symbols)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 7:30 PM

Up to now, we have developed a debugger that can attach and detach from a process, set and remove breakpoints, print registers and a call stack, and modify control flow by changing the executing thread context. These are all pretty essential features of a debugger. The topic of  this post, debug symbols, is more of a “nice-to-have”. An application may or may not ship with debug symbols, but in the event that it does, i.e. it’s your own application, then the process of debugging becomes significantly more simple.

Debug Symbols

At its simplest definition, a debug symbol is a piece of information that shows how specific parts of a compiled program map back to the source level. For example, a debug symbol might tell information about the name of a variable at a memory address, or which line of code, and in which file, a series of assembly instructions map to. They are typically generated during debug builds and are used to provide some clarity to a developer that is debugging (or reverse engineering) a piece of code. There is no universal debug symbol format for a language, and they may vary between compilers. On the modern Windows platform, debug symbols come in the form of Program Database (PDB) files, ending with a .pdb extension.

These files hold a lot of useful information about the compiled executable or DLL. As mentioned above, they can contain information regarding which source file and line number (or which object file) a symbol at a certain address maps to. They can contain the names and types of global, static, and local variables, as well as classes and structs. They can also contain information compiler optimizations that were used when compiling the code. Some of these things may not be present if the code was compiled with stripped symbols. During a debugging session, the debugger will initialize a symbol handler and begin looking for, either recursively in common directories and/or user-specified directories, and parsing* matching PDB files. When a user is debugging, symbol information can be retrieved and names and source line numbers can be displayed to them (if available).
* This is a useful open source parser that can parse the proprietary format of PDB files.

Implementation

Microsoft provides a very rich set of APIs for handling symbols through the DbgHelp API. There are functions to load/enumerate symbols for a module, find a symbol by name or address, enumerate source file and line references found in PDBs, dynamically add or remove entries from the symbol table, interact with symbol stores, and much more. Given the very large API, I’ve only chosen to demonstrate implementation of the more common features. One thing to consider is that all functions in the DbgHelp API set are single threaded. The example code is single threaded, but does not have concurrency synchronization to ensure that it is only called from a single thread, meaning if you’re implementing something off of this code, make sure that you add concurrency synchronization.

Initializing a symbol handler is pretty straightforward: it merely involves calling SymInitialize. The function takes a process handle, which is opened by the debugger when it attaches. There is also a parameter for the user search path to locate PDB files, and a third parameter to specify whether the debugger is to enumerate all of the loaded modules in the process and load their symbols as well. For an attaching debugger, specifying that this behavior is dependent on the situation. There is a case, such as the debugger creating the target process to debug, or with delay-loaded DLLs, that can cause some symbols to not be loaded. Additionally, if this third parameter is set to true and the symbol handler is initialized prior to receiving all of the LOAD_DLL_DEBUG_EVENT events, then some symbols may not be loaded. The implementation sample code has been defaulted to false, and symbols for modules will be loaded in the CREATE_PROCESS_DEBUG_EVENT and LOAD_DLL_DEBUG_EVENT event handlers. This ensures that all symbol files for every module will be properly loaded.

Prior to initializing the symbol handler, the SymSetOptions function should be called, which configures how and what information the symbol handler will load. Simply put into code, the initialization routine looks like the following:

Symbols::Symbols(const HANDLE hProcess, const HANDLE hFile, const bool bLoadAll /*= false*/)
    : m_hProcess{ hProcess }, m_hFile{ hFile }
{
    (void)SymSetOptions(SYMOPT_CASE_INSENSITIVE | SYMOPT_DEFERRED_LOADS |
        SYMOPT_LOAD_LINES | SYMOPT_UNDNAME);
 
    const bool bSuccess = BOOLIFY(SymInitialize(hProcess, nullptr, bLoadAll));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not initialize symbol handler. Error = %X.\n",
            GetLastError());
    }
}

The options here specify that symbol searches will be case insensitive, that symbols won’t be loaded until a reference is made (not to be confused with delay-loading  for DLLs that were mentioned above), that line information will be loaded, and that symbols will be displayed in an undecorated form. Case insensitivity and undecorated names are there for convenience; it would be annoying to search for exact symbol names such as “?f@@YAHD@Z” otherwise.

When the symbol handler is finished, i.e. the debugger is detaching from the process, a simple call to SymCleanup will terminate the symbol handler:

Symbols::~Symbols()
{
    const bool bSuccess = BOOLIFY(SymCleanup(m_hProcess));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not terminate symbol handler. Error = %X.\n",
            GetLastError());
    }
}

That sets up the initialization and termination of the symbol handler. Time for everything in between.

Enumerating Symbols

One useful feature of a debugger might be to internally enumerate all symbols of a module. This can allow for storage and fast lookup at a later time. Or it can allow for a graphic display for the user and easy navigation to the symbol address from its name. Enumerating symbols is a two step process: first SymLoadModuleEx is called to load the symbol table for the module, then SymEnumSymbols can be called with the base address of the module. SymEnumSymbols takes a callback of type PSYM_ENUMERATESYMBOLS_CALLBACK as a parameter. This callback will be called for every symbol found in the modules symbol table and will have a SYMBOL_INFO structure that shows information about the symbol, such as its name, address, whether it is a register, what value it holds if its a constant, etc. Put in to code, this is rather straightforward:

const bool Symbols::EnumerateModuleSymbols(const char * const pModulePath, const DWORD64 dwBaseAddress)
{
    DWORD64 dwBaseOfDll = SymLoadModuleEx(m_hProcess, m_hFile, pModulePath, nullptr,
        dwBaseAddress, 0, nullptr, 0);
    if (dwBaseOfDll == 0)
    {
        fprintf(stderr, "Could not load modules for %s. Error = %X.\n",
            pModulePath, GetLastError());
        return false;
    }
 
    UserContext userContext = { this, pModulePath };
    const bool bSuccess = 
       BOOLIFY(SymEnumSymbols(m_hProcess, dwBaseOfDll, "*!*", SymEnumCallback, &userContext));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not enumerate symbols for %s. Error = %X.\n",
            pModulePath, GetLastError());
    }
 
    return bSuccess;
}

Resolving Symbols

There are several ways to resolve symbols, but the two most common are by name and by address. This can be achieved by calling SymFromName and SymFromAddr respectively. Both of these populate a SYMBOL_INFO structure, just as calling SymEnumSymbols does. Invoking them is also rather straightforward:

const bool Symbols::SymbolFromAddress(const DWORD64 dwAddress, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    DWORD64 dwDisplacement = 0;
    const bool bSuccess = BOOLIFY(SymFromAddr(m_hProcess, dwAddress, &dwDisplacement, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol from address %p. Error = %X.\n",
            (DWORD_PTR)dwAddress, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found at %p. Name: %.*s. Base address of module: %p\n",
        (DWORD_PTR)dwAddress, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByName(pSymInfo->Name);
 
    return bSuccess;
}
 
const bool Symbols::SymbolFromName(const char * const pName, const SymbolInfo **pFullSymbolInfo)
{
    char pBuffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME * sizeof(char)
        + sizeof(ULONG64) - 1 / sizeof(ULONG64)] = { 0 };
    PSYMBOL_INFO pSymInfo = (PSYMBOL_INFO)pBuffer;
 
    pSymInfo->SizeOfStruct = sizeof(SYMBOL_INFO);
    pSymInfo->MaxNameLen = MAX_SYM_NAME;
 
    const bool bSuccess = BOOLIFY(SymFromName(m_hProcess, pName, pSymInfo));
    if (!bSuccess)
    {
        fprintf(stderr, "Could not retrieve symbol for name %s. Error = %X.\n",
            pName, GetLastError());
        return false;
    }
 
    fprintf(stderr, "Symbol found for %s. Name: %.*s. Address: %p. Base address of module: %p\n",
        pName, pSymInfo->NameLen, pSymInfo->Name, (DWORD_PTR)pSymInfo->Address,
        (DWORD_PTR)pSymInfo->ModBase);
 
    *pFullSymbolInfo = FindSymbolByAddress((DWORD_PTR)pSymInfo->Address);
 
    return bSuccess;
}

with the SymbolInfo structure being an extended structure that holds information about source files and line numbers (see example code).

Testing the functionality

To test this functionality, we can take the sample program from the previous post (reproduced below) and see the difference in how call stacks look. The new functionality in this version has added the ability to resolve symbols for the addresses in the callstack. Also, the debugger was augmented to add two new abilities: to dump all symbols from a module, and to set/remove breakpoints on a symbol by name.

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

Setting a breakpoint on the d function and printing the call stacks shows the more useful functionality between the previous version of the debugger and this one. Entered commands are shown in red, while new symbol information is shown in orange.

a
[A]ddress or [s]ymbol name? s
Name: d
Received breakpoint at address 00401090.
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 00401090
Stack address: 00000000
Frame address: 0018FDE8
Symbol name: d
Symbol address: 00401090
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 4
Frame: 1
Execution address: 0040107C
Stack address: 00000000
Frame address: 0018FDEC
Symbol found at 0040107C. Name: c. Base address of module: 00400000
Symbol name: c
Symbol address: 00401060
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 9
Frame: 2
Execution address: 0040104C
Stack address: 00000000
Frame address: 0018FE40
Symbol found at 0040104C. Name: b. Base address of module: 00400000
Symbol name: b
Symbol address: 00401030
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 15
Frame: 3
Execution address: 0040101C
Stack address: 00000000
Frame address: 0018FE94
Symbol found at 0040101C. Name: a. Base address of module: 00400000
Symbol name: a
Symbol address: 00401000
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 21
Frame: 4
Execution address: 004010EF
Stack address: 00000000
Frame address: 0018FEE8
Symbol found at 004010EF. Name: main. Base address of module: 00400000
Symbol name: main
Symbol address: 004010B0
Address displacement: 0
Source file: c:\users\demo\desktop\demoapp\source.cpp
Line number: 27
Frame: 5
Execution address: 004013A9
Stack address: 00000000
Frame address: 0018FF3C
Symbol found at 004013A9. Name: __tmainCRTStartup. Base address of module: 00400000
Symbol name: __tmainCRTStartup
Symbol address: 00401210
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 473
Frame: 6
Execution address: 004014ED
Stack address: 00000000
Frame address: 0018FF8C
Symbol found at 004014ED. Name: mainCRTStartup. Base address of module: 00400000

Symbol name: mainCRTStartup
Symbol address: 004014E0
Address displacement: 0
Source file: f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c
Line number: 456
Frame: 7
Execution address: 76AE919F
Stack address: 00000000
Frame address: 0018FF94
Symbol found at 76AE919F. Name: BaseThreadInitThunk. Base address of module: 00000000
Symbol name: BaseThreadInitThunk
Symbol address: 76AE9191
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 8
Execution address: 77430BBB
Stack address: 00000000
Frame address: 0018FFA0
Symbol found at 77430BBB. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
Frame: 9
Execution address: 77430B91
Stack address: 00000000
Frame address: 0018FFE4
Symbol found at 77430B91. Name: RtlInitializeExceptionChain. Base address of module: 00000000
Symbol name: RtlInitializeExceptionChain
Symbol address: 77430B37
Address displacement: 0
Source file: (null)
Line number: 0
StackWalk64 finished.

This looks much more useful compared to just getting absolute addresses as in the previous version. Here, for some symbols, the source files can be found on the host machine and be presented to the user alongside the raw assembly. Additionally, symbols  can be printed for any module as shown below:

y
Enter in module name to dump symbols for: kernel32.dll
Symbol name: QuirkIsEnabledWorker
Symbol address: 76AE0010
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: EnumCalendarInfoExEx
Symbol address: 76AE03BD
Address displacement: 0
Source file: (null)
Line number: 0
Symbol name: GetFileMUIPath
Symbol address: 76AE03CE
Address displacement: 0
Source file: (null)
Line number: 0
...

That concludes the topic on symbols. The implementation presented here only scratched the surface of what is available in terms of the DbgHelp API, and I recommend that those interested further explore the MSDN documentation on the topics. The next article will conclude the series with a collection of miscellaneous features that debuggers typically possess. For that piece, it will probably include the ability to step over code (step into is currently implemented), present a disassembly listing to the user for x86 and x64, and allow for modification of arbitrary memory, instead of just registers and/or a thread context.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

December 5, 2014

Writing a Primitive Debugger: Part 3 (Call Stack, Registers, Contexts)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 6:47 PM

Up to now, all of the functionality discussed in writing a debugger has been related to getting a debugger attached to a process and being able to set breakpoints and perform single stepping. While certainly useful, this functionality is more passive debugging: you can break the state of the process at a certain point and instrument it at the instruction level, but you cannot actually modify any behavior, or even view how the process got to that state. The next core functionality that will be covered will detail actually being able to view and change program execution state (in the form of the thread context, namely registers), and being able to view the thread’s call stack upon hitting a breakpoint.

Thread Contexts

A thread context, as defined relevant to Windows, “includes the thread’s set of machine registers, the kernel stack, a thread environment block, and a user stack in the address space of the thread’s process.” For a usermode debugger, which is what is being developed in these posts, the important parts are the machine registers and the user stack. The thread environment block is also accessible from user-mode but won’t be covered here due to its undocumented and very specific nature. When a process starts up, the loader will set up the processes main thread and begin execution at the entry point. This main thread can in turn launch additional threads, which themselves launch threads, and so on. Each of these threads will have their own context containing the items listed above.

The purpose of these contexts is that Windows, being a preemptive multitasking operating system, can have any [usermode] task, such as a thread executing code, interrupted at any point in time. During these interruptions, a context switch will be carried out, which is simply the process of saving the current execution context and setting the new one to execute. Eventually, when the original task is scheduled to resume, a context switch will again occur back to the context of the original thread and it will continue executing as if nothing had happened. What do these contexts look like? The answer is that it is entirely processor-specific, which shouldn’t be too surprising given that they store registers.

In Windows, the part of the thread context that is available to developers comes defined as a CONTEXT structure in winnt.h. For example, below is a snippet from a CONTEXT structure for x86 processors.

typedef struct _CONTEXT {
    DWORD   Dr0;
    DWORD   Dr1;
    DWORD   Dr2;
    ...
    FLOATING_SAVE_AREA FloatSave;
    DWORD   SegGs;
    DWORD   SegFs;
    ...
    DWORD   Edi;
    DWORD   Esi;
    DWORD   Ebx;
    ...
    DWORD   Ebp;
    DWORD   Eip;
    ...

The x64 version looks pretty closely related, with register widths being extended to 64-bits as well as additional registers and extensions added.

typedef struct DECLSPEC_ALIGN(16) _CONTEXT {
    DWORD ContextFlags;
    DWORD MxCsr;
    ...
    WORD   SegGs;
    WORD   SegSs;
    DWORD EFlags;
    ...
    DWORD64 Rax;
    DWORD64 Rcx;
    DWORD64 Rdx;
    ...
    DWORD64 R13;
    DWORD64 R14;
    DWORD64 R15;
    ...

This is the structure that will be the most useful to inspect and modify when debugging. A debugger should be able to print out this structure and allow for modification of any of its fields. Fortunately, there are two very useful APIs for retrieving and modifying this structure: GetThreadContext and SetThreadContext. These have been covered previously when discussing how to enable single-stepping. The context had to be retrieved and the EFlags registered modified. So what modifications are needed to the existing code/logic in order to add this functionality? It’s as simple as opening a handle to the current executing (or in the debuggers case, broken) thread and retrieving/setting the context.

const CONTEXT Debugger::GetExecutingContext()
{
    CONTEXT ctx = { 0 };
    ctx.ContextFlags = CONTEXT_ALL;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bool bSuccess = BOOLIFY(GetThreadContext(hThread(), &ctx));
        if (!bSuccess)
        {
            fprintf(stderr, "Could not get context for thread %X. Error = %X\n", m_dwExecutingThreadId, GetLastError());
        }
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return ctx;
}
 
const bool Debugger::SetExecutingContext(const CONTEXT &ctx)
{
    bool bSuccess = false;
    SafeHandle hThread = OpenCurrentThread();
    if (hThread.IsValid())
    {
        bSuccess = BOOLIFY(SetThreadContext(hThread(), &ctx));
    }
 
    memcpy(&m_lastContext, &ctx, sizeof(CONTEXT));
 
    return bSuccess;
}

For each access or modification, there is a handle opened (and closed) to the current thread — this certainly isn’t the most efficient approach, but serves well enough for demo purposes.  The state of the context is then stored in m_lastContext. These functions are invoked when the process receives an EXCEPTION_BREAKPOINT and when single stepping the process, i.e. handling the EXCEPTION_SINGLE_STEP exception. Therefore, m_lastContext will always have the appropriate register values in the context structure when a breakpoint is hit or when the user is single stepping. These functions can also be invoked when the user wants to modify a certain register or registers through the debugger interface.  Printing the context involves nothing more than printing out the values in the structure. I’ve chosen to only print out the more commonly used registers for the example code:

void Debugger::PrintContext()
{
#ifdef _M_IX86
    fprintf(stderr, "EAX: %p EBX: %p ECX: %p EDX: %p\n"
        "ESP: %p EBP: %p ESI: %p EDI: %p\n"
        "EIP: %p FLAGS: %X\n",
        m_lastContext.Eax, m_lastContext.Ebx, m_lastContext.Ecx, m_lastContext.Edx,
        m_lastContext.Esp, m_lastContext.Ebp, m_lastContext.Esi, m_lastContext.Edi,
        m_lastContext.Eip, m_lastContext.EFlags);
#elif defined _M_AMD64
    fprintf(stderr, "RAX: %p RBX: %p RCX: %p RDX: %p\n"
        "RSP: %p RBP: %p RSI: %p RDI: %p\n"
        "R8: %p R9: %p R10: %p R11: %p\n"
        "R12: %p R13: %p R14: %p R15: %p\n"
        "RIP: %p FLAGS: %X\n",
        m_lastContext.Rax, m_lastContext.Rbx, m_lastContext.Rcx, m_lastContext.Rdx,
        m_lastContext.Rsp, m_lastContext.Rbp, m_lastContext.Rsi, m_lastContext.Rdi,
        m_lastContext.R8, m_lastContext.R9, m_lastContext.R10, m_lastContext.R11,
        m_lastContext.R12, m_lastContext.R13, m_lastContext.R14, m_lastContext.R15,
        m_lastContext.Rip, m_lastContext.EFlags);
#else
#error "Unsupported architecture"
#endif
}

Call Stacks

At the lowest level, the scope of a function is defined by its stack frame. This is a compiler and/or ABI defined construct for how the state of the function will be layed out. A stack frame typically includes the return address of the caller, any parameters that were passed to the function from the caller, and space for local variables that exist within the scope of the function. For x86 and x64, among other architectures, these stack frames are preceded with a prologue, which is the code responsible for setting up the stack and frame pointers (ESP/EBP or RSP/RBP) from the caller to the callee. Prior to the callee returning, there is an epilogue, which is responsible for returning the stack and frame pointers to that of the caller. For example, consider the following C function:

void TestFunction(int a, int b, int c)
{
    int d = 4, e = 5, f = 6;
}

which was called in the following way

push        3  
push        2  
push        1  
call        TestFunction

Disassembled as x86, this becomes:


push        ebp  
mov         ebp,esp  
sub         esp,0Ch  
mov         dword ptr [ebp-4],4  
mov         dword ptr [ebp-8],5  
mov         dword ptr [ebp-0Ch],6  
mov         esp,ebp  
pop         ebp  
ret         0Ch  

The prologue and epilogue are highlighted in orange. After the execution of the prologue, the stack frame for this function will contain the callers frame pointer in [EBP], the return address at [EBP+4] (because the CALL instruction implicitly pushes the address of the next instruction on the stack before changing execution), and the passed parameters at [EBP+8], [EBP+12], and [EBP+16]. The prologue subtracted 12 from the base of the stack to make room for local variables — the three 32-bit ints declared within the function. These will be at [EBP-4], [EBP-8], and [EBP-12], as can be see in the disassembly.

This setup is pretty convenient because it offers easy distinction between what is a parameter and what is a local variable. Debugging becomes a bit easier since everything is held on the stack and indexed through the frame pointer, rather than scattered around between registers and the stack. This changes a bit as you go from x86 to x64, where x64 will store the first four (or six, depending on your compiler/platform) arguments in registers, and the rest on the stack. This can also change a bit depending on calling conventions and compiler optimizations, especially frame-pointer omission.

Since the stack frame stores the return address of the caller, it is possible to see where the function was called from. That is what the call stack is: a collection of stack frames that represent the call chain in the code leading up to the current stack frame. This information is very useful to have in terms of debugging, because a bug that presented itself in one function may have manifested earlier on in the code. Being able to quickly traverse frames, and see the values within those frames, is an invaluable aid to debugging.

On the Windows platform, there is a convenient function that performs the tedium/annoyance of walking stack frames backwards: StackWalk64. This function is x86 and x64 compatible, but does require some setup prior to being invoked. Given the very machine-specific layout of stack frames, the StackWalk64 function requires filling out a STACKFRAME64 structure, which will be passed to it as an argument. Filling out this structure merely involves setting the instruction, frame, and stack pointers, along with the address modes, which will be flat addressing for the case of modern Windows on x86 and x64. Once this structure is set up, StackWalk64 can be called in a loop to retrieve the frames. Put into code, it looks like the following:

void Debugger::PrintCallStack()
{
    STACKFRAME64 stackFrame = { 0 };
    const DWORD_PTR dwMaxFrames = 50;
    CONTEXT ctx = GetExecutingContext();
 
    stackFrame.AddrPC.Mode = AddrModeFlat;
    stackFrame.AddrFrame.Mode = AddrModeFlat;
    stackFrame.AddrStack.Mode = AddrModeFlat;
 
#ifdef _M_IX86
    DWORD dwMachineType = IMAGE_FILE_MACHINE_I386;
    stackFrame.AddrPC.Offset = ctx.Eip;
    stackFrame.AddrFrame.Offset = ctx.Ebp;
    stackFrame.AddrStack.Offset = ctx.Esp;
#elif defined _M_AMD64
    DWORD dwMachineType = IMAGE_FILE_MACHINE_AMD64;
    stackFrame.AddrPC.Offset = ctx.Rip;
    stackFrame.AddrFrame.Offset = ctx.Rbp;
    stackFrame.AddrStack.Offset = ctx.Rsp;
#else
#error "Unsupported platform"
#endif
 
    SafeHandle hThread = OpenCurrentThread();
    for (int i = 0; i < dwMaxFrames; ++i)
    {
        const bool bSuccess = BOOLIFY(StackWalk64(dwMachineType, m_hProcess(), hThread(), &stackFrame,
            (dwMachineType == IMAGE_FILE_MACHINE_I386 ? nullptr : &ctx), nullptr,
            SymFunctionTableAccess64, SymGetModuleBase64, nullptr));
        if (!bSuccess || stackFrame.AddrPC.Offset == 0)
        {
            fprintf(stderr, "StackWalk64 finished.\n");
            break;
        }
 
        fprintf(stderr, "Frame: %X\n"
            "Execution address: %p\n"
            "Stack address: %p\n"
            "Frame address: %p\n",
            i, stackFrame.AddrPC.Offset,
            stackFrame.AddrStack.Offset, stackFrame.AddrFrame.Offset);
    }
}

Testing the functionality

To test this functionality we can create another demo app that will be used as the debug target. The simple one below is what I used:

#include <cstdio>
 
void d()
{
    printf("d called.\n");
}
 
void c()
{
    printf("c called.\n");
    d();
}
 
void b()
{
    printf("b called.\n");
    c();
}
 
void a()
{
    printf("a called.\n");
    b();
}
 
int main(int argc, char *argv[])
{
    printf("Addresses: \n"
        "a: %p\n"
        "b: %p\n"
        "c: %p\n"
        "d: %p\n",
        a, b, c, d);
 
    getchar();
    while (true)
    {
        a();
        getchar();
    }
 
    return 0;
}

I would recommend disabling incremental linking and ASLR (on the executable, not the system) for convenience sake. Below is the stack trace that Visual Studio produces when a breakpoint is set inside the d function and hit.

Demo.exe!d() Line 5	C++
Demo.exe!c() Line 14	C++
Demo.exe!b() Line 20	C++
Demo.exe!a() Line 26	C++
Demo.exe!main(int argc, char * * argv) Line 41	C++
Demo.exe!__tmainCRTStartup() Line 626	C
Demo.exe!mainCRTStartup() Line 466	C
kernel32.dll!@BaseThreadInitThunk@12()	Unknown
ntdll.dll!___RtlUserThreadStart@8()	Unknown
ntdll.dll!__RtlUserThreadStart@8()	Unknown

Attaching with the debugger also yields 10 frames, as listed below:

a
Target address: 0x4010f0
Received breakpoint at address 004010F0
Press c to continue or s to begin stepping.
l
Frame: 0
Execution address: 004010F0
Stack address: 00000000
Frame address: 0018FBE4
Frame: 1
Execution address: 004010DA
Stack address: 00000000
Frame address: 0018FBE8
Frame: 2
Execution address: 0040108A
Stack address: 00000000
Frame address: 0018FCBC
Frame: 3
Execution address: 0040103A
Stack address: 00000000
Frame address: 0018FD90
Frame: 4
Execution address: 004011C6
Stack address: 00000000
Frame address: 0018FE64
Frame: 5
Execution address: 00401699
Stack address: 00000000
Frame address: 0018FF38
Frame: 6
Execution address: 004017DD
Stack address: 00000000
Frame address: 0018FF88
Frame: 7
Execution address: 75D5338A
Stack address: 00000000
Frame address: 0018FF90
Frame: 8
Execution address: 77339F72
Stack address: 00000000
Frame address: 0018FF9C
Frame: 9
Execution address: 77339F45
Stack address: 00000000
Frame address: 0018FFDC
StackWalk64 finished.

The output is a bit less elegant than the Visual Studio debugger, but it is correct, which is the more important part. It would be nice, however, to put names to some of those addresses. That is where symbol loading and mapping come in, which will be the subject of the next post.

Article Roadmap
Future posts will be related on topics closely following the items below:

  • Basics
  • Adding/Removing Breakpoints, Single-stepping
  • Call Stack, Registers, Contexts
  • Symbols
  • Miscellaneous Features

The full source code relating to this can be found here. C++11 features were used, so MSVC 2012/2013 is most likely required.

« Newer PostsOlder Posts »

Powered by WordPress