RCE Endeavors 😅

April 8, 2015

Reverse Engineering Vectored Exception Handlers: Structures (1/3)

Filed under: General x86,General x86-64,Programming,Reverse Engineering — admin @ 5:02 PM

This series of posts will cover the details of reverse engineering the AddVectoredExceptionHandler function, a Windows API function responsible for registering a special type of exception handler at runtime. The series will be split in to three parts: first identifying key structures that are used, second understanding the implementation, and lastly re-implementing the reverse engineered assembly to working C code. This reverse engineered implementation will behave identically with the original function, and presumably under the same compiler options, would produce a very close assembly listing. The reverse engineering was done on Windows 7, so there will be slight differences in assembly listings if you are following along on a different version. The re-implementation code (part 3) was tested on Windows 7 and 8.1 on x86 and x64, so the high-level details should not change.

Starting out

The goal is to see how AddVectoredExceptionHandler works. This means tracing it through from an example program over in to kernel32.dll, where the implementation resides. Naturally, the best way to go about doing this is with a debugger. The Visual Studio debugger will be the debugger of choice for this series since we’ll be debugging our own code.

vec1

Stepping in to the disassembly shows that AddVectoredExceptionHandler calls _RtlAddVectoredExceptionHandler, which in turn is a wrapper for _RtlpAddVectoredHandler. The assembly listing for _RtlAddVectoredExceptionHandler is shown below:

_RtlAddVectoredExceptionHandler@8:
771F742B  mov         edi,edi  
771F742D  push        ebp  
771F742E  mov         ebp,esp  
771F7430  push        0  
771F7432  push        dword ptr [ebp+0Ch]  
771F7435  push        dword ptr [ebp+8]  
771F7438  call        _RtlpAddVectoredHandler@12 (771E3621h)  
771F743D  pop         ebp  
771F743E  ret         8

This  code simply pushes an extra constant parameter and invokes _RtlpAddVectoredHandler(FirstHandler, VectoredHandler, 0). The actual details reside in _RtlpAddVectoredHandler, reproduced in its entirety, below:

_RtlpAddVectoredHandler@12:
771E3621  mov         edi,edi  
771E3623  push        ebp  
771E3624  mov         ebp,esp  
771E3626  mov         eax,dword ptr fs:[00000018h]  
771E362C  mov         eax,dword ptr [eax+30h]  
771E362F  push        esi  
771E3630  push        10h  
771E3632  push        0  
771E3634  push        dword ptr [eax+18h]  
771E3637  call        _RtlAllocateHeap@12 (771AE026h)  
771E363C  mov         esi,eax  
771E363E  test        esi,esi  
771E3640  je          _RtlpAddVectoredHandler@12+83h (771E36A4h)  
771E3642  push        ebx  
771E3643  push        edi  
771E3644  push        dword ptr [ebp+0Ch]  
771E3647  mov         dword ptr [esi+8],1  
771E364E  call        _RtlEncodePointer@4 (771C0FCBh)  
771E3653  mov         ebx,dword ptr [ebp+10h]  
771E3656  imul        ebx,ebx,0Ch  
771E3659  add         ebx,77284724h  
771E365F  push        ebx  
771E3660  mov         dword ptr [esi+0Ch],eax  
771E3663  lea         edi,[ebx+4]  
771E3666  call        _RtlAcquireSRWLockExclusive@4 (771B29F1h)  
771E366B  cmp         dword ptr [edi],edi  
771E366D  jne         _RtlpAddVectoredHandler@12+65h (771E3686h)  
771E366F  mov         ecx,dword ptr fs:[18h]  
771E3676  mov         eax,dword ptr [ebp+10h]  
771E3679  mov         ecx,dword ptr [ecx+30h]  
771E367C  add         eax,2  
771E367F  add         ecx,28h  
771E3682  lock bts    dword ptr [ecx],eax  
771E3686  cmp         dword ptr [ebp+8],0  
771E368A  je          _RtlpAddVectoredHandler@12+13DF3h (771F7414h)  
    ----> Jump resolved below
    771F7414  mov         eax,dword ptr [edi+4]  
    771F7417  mov         dword ptr [esi],edi  
    771F7419  mov         dword ptr [esi+4],eax  
    771F741C  mov         dword ptr [eax],esi  
    771F741E  mov         dword ptr [edi+4],esi  
    771F7421  jmp         _RtlpAddVectoredHandler@12+7Bh (771E369Ch)  
771E3690  mov         eax,dword ptr [edi]  
771E3692  mov         dword ptr [esi],eax  
771E3694  mov         dword ptr [esi+4],edi  
771E3697  mov         dword ptr [eax+4],esi  
771E369A  mov         dword ptr [edi],esi  
771E369C  push        ebx  
771E369D  call        _RtlReleaseSRWLockExclusive@4 (771B29ABh)  
771E36A2  pop         edi  
771E36A3  pop         ebx  
771E36A4  mov         eax,esi  
771E36A6  pop         esi  
771E36A7  pop         ebp  
771E36A8  ret         0Ch  

Decoding the Assembly

Don’t mind the gratuitous highlighting above; it is there to highlight individual pieces of the function and make it more manageable to understand. The function begins by performing a call to RtlAllocateHeap, highlighted in orange. The three parameters provided are [EAX+18], 0, and 16 (0x10). EAX is initially loaded with the address of the TIB structure (light pink). From this structure, the PEB structure is then retrieved. The member at [PEB+0x18], which is documented as ProcessHeap is then given to RtlAllocateHeap. Everything here seems to make sense so far.

Next, in green, comes a call to RtlEncodePointer, which is the implementation of EncodePointer. The address of the vectored handler, at [EBP+0xC] is given as the argument here. This function, as its name implies, is responsible for encoding the provided pointer. It does this by performing an XOR with a cookie value generated at runtime.

From earlier, it should be noticed that the requested allocation size provided to RtlAllocateHeap was 16 bytes (0x10). The next few instructions give some information about how this returned memory is accessed. The instructions in black move two values into this memory region, one at 0x8 and one at 0xC. Given this information, it is safe to assume that what is being allocated is a 16 byte struct. The third field at +0x8 is always set to 1 in this function, and the fourth at +0xC is set to hold the encoded handler address. It’s possible to write out a basic definition for this struct at this point:

struct MysteryStruct
{
    DWORD dwUnknown1; +0x0
    DWORD dwUnknown2; +0x4
    DWORD dwAlwaysOne; +0x8
    PVECTORED_EXCEPTION_HANDLER pVectoredHandler; +0xC
};

This definition will be revisited and completed later.

The next block of code, in teal, performs some arithmetic operations. It loads [EBP+0x10], which was shown to be always 0 (from _RtlAddVectoredExceptionHandler) into EBX. This value is multipled by 12 (0xC), which still yields a zero. Then the value 0x77284724 is added to it. Checking what resides at this address in a debugger shows something interesting:

_LdrpVectorHandlerList:
77284724 01 00                add         dword ptr [eax],eax  
77284726 00 00                add         byte ptr [eax],al  
77284728 28 47 28             sub         byte ptr [edi+28h],al  
7728472B 77 28                ja          _RtlpProcessHeapsListBuffer+15h (77284755h)  
7728472D 47                   inc         edi  
7728472E 28 77 00             sub         byte ptr [edi],dh  
...

It turns out that 0x77284724 is the address of the symbol _LdrpVectorHandlerList. The non-sense assembly instructions there are simply mnemonic representations of _LdrpVectorHandlerList‘s  data members. The base of this structure is used as an argument for _RtlAcquireSRWLockExclusive, which is the implementation of AcquireSRWLockExclusive. This function takes a PSRWLOCK argument. Given this, it is immediately possible to deduce that the first member of _LdrpVectorHandlerList is an SRWLOCK structure. More about this structure will be revealed later.

The code in bright pink begins by loading the second field in the _LdrpVectorHandlerList structure in to EDI. This value is then dereferenced and compared to its own address — basically a check if a pointer is pointing to itself. If that is the case then the rest of the pink block will be executed. The code once again retrieves the PEB structure similar to light pink. Expect this time, [PEB+0x28] will be the value that ends up being used. Additionally, it loads [EBP+0x10] (always 0) into EAX, and adds 2 to it. There it an atomic bit test and set instruction that is carried out between [PEB+0x28] and 2. [PEB+0x28] has been documented as “CrossProcessFlags” and is a bit of a mystery in the context of this function.

Lastly, the block in red is where the actual interesting code happens. It begins by checking to see if the first parameter to the function, the flag saying whether the handler is to be the first or last in the chain, is zero. In either case, there are a lot of pointers moving around from looking at the instructions. One would guess that from implementing an exception handler list that there would be pointers to next/previous nodes. Lets begin investigating the case where an exception handler will be added to front of the chain (FirstHandler parameter does not equal 0). Starting at 0x771E3690, [EDI] is moved into EAX. From earlier, [EDI] holds the second member of the _LdrpVectorHandlerList structure. This is then moved in to [ESI], which is the first member of the structure allocated with RtlAllocateHeap (MysteryStruct above). Then EDI (not dereferenced) is moved in to [ESI+0x4].

This completes finding references to the allocated structure. RtlAllocateHeap had a request for 16 bytes, and 16 bytes have now been used/written to. ESI is then moved in to [EAX+4] and [EDI], which relate to two pointers in _LdrpVectorHandlerList. The part where the handler is added to the back over the list won’t be covered in this post, since it’s basically the same thing except for which pointers get rearranged.

Finalizing Structure Definitions

Going through the code revealed two main structures at work here. There is the 16 byte structure that was allocated in the beginning and the _LdrpVectorHandlerList structure. The MysteryStruct from earlier can be better defined now. I’ve renamed it as _LdrpVectorHandlerEntry to be consistent with the known _LdrpVectorHandlerList symbol.

typedef struct _LdrpVectorHandlerEntry
{
    _LdrpVectorHandlerEntry *pLink1; +0x0
    _LdrpVectorHandlerEntry *pLink2; +0x4
    DWORD dwAlwaysOne; +0x8
    PVECTORED_EXCEPTION_HANDLER pVectoredHandler; +0xC
} VECTORED_HANDLER_ENTRY, *PVECTORED_HANDLER_ENTRY;

Also, from studying the pointer swapping operations between the new entry and the list, it is possible to define _LdrpVectorHandlerList a bit more clearly as well:

typedef struct _LdrpVectorHandlerList
{
    SRWLOCK srwLock; +0x0
    VECTORED_HANDLER_ENTRY *pLink1; +0x4
    VECTORED_HANDLER_ENTRY *pLink2; +0x8
} VECTORED_HANDLER_LIST, *PVECTORED_HANDLER_LIST; +0xC

The types in these structures have been defined. The next part of this series will cover how the links behave. Follow on Twitter for more updates.

April 4, 2015

Hiding Functionality with Exception Handlers (2/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 1:49 PM

This post will cover the second part of hiding functionality with exception handlers. Unlike the technique presented in the previous post, which modified the SEH record for the local thread, the aim here is to modify the SEH record for another thread in order to better hide what is actually going on. By the end of the post, there should be enough information to put together a working application capable of modifying the SEH list of any thread (barring some exceptions) and causing it to raise an exception to execute your code. The sample application will be a DLL that is injected into a process and hijacks one of its threads to perform some task.

What is the purpose of doing all of this if you’re injecting into a process anyway? After all, you can simply spawn your own thread or likely use the one created during the injection (if CreateRemoteThread was used) and just begin executing your code. I’d argue that this technique gives more obscurity to what is happening during static analysis and is something out of the norm. Plus its fun!

The overall code is very similar to what the first part showed, but now there need to be a few steps added in order to get the TIB of another thread. There are usually a few different approaches, of varying complexity and reliability.

  • Do it directly. Suspend the thread and gets its context. Change the instruction pointer to point to your code which changes the SEH list and raises an interrupt and resume. Perform your task and restore the original context in your SEH handler.
  • Do it indirectly. Suspend the thread, queue an asynchronous procedure call (APC) which changes the SEH list and raises an interrupt (with QueueUserAPC), and resume the thread. The thread must be in an alertable state (waiting on something) for this to work, which is typically the case for most threads in a process.
  • Take the middle ground. Suspend the thread and get the address of its FS segment directly using GetThreadSelectorEntry. Change the SEH list from within your thread and queue an APC to raise the interrupt, resume the thread.

The easiest approach is to do it indirectly with an APC. The code is really straightforward and looks like the following:

void InstallExceptionHandler(DWORD dwThreadId)
{
    auto handle = ThreadHandleTable[dwThreadId];
 
    DWORD dwError = SuspendThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not suspend thread. Error = %X.\n",
            GetLastError());
        return;
    }
 
    CONTEXT ctx = { CONTEXT_ALL };
    GetThreadContext(handle, &ctx);
    LDT_ENTRY ldtEntry = { 0 };
 
    GetThreadSelectorEntry(handle, ctx.SegFs, &ldtEntry);
    const DWORD dwFSAddress =
        (ldtEntry.HighWord.Bits.BaseHi << 24) |
        (ldtEntry.HighWord.Bits.BaseMid << 16) |
        (ldtEntry.BaseLow);
 
    fprintf(stderr, "FS segment address of target thread should be: %X.\n",
        dwFSAddress);
 
    dwError = QueueUserAPC(APCProc, handle, 0);
    if (dwError == 0)
    {
        fprintf(stderr, "Could not queue APC to thread. Error = %X.\n",
            GetLastError());
    }
 
    dwError = ResumeThread(handle);
    if (dwError == -1)
    {
        fprintf(stderr, "Could not resume thread. Error = %X.\n",
            GetLastError());
    }
}

Here the suspend/queue/resume wording is put directly in to code (with extra debug comments). When the thread resumes, APCProc will be invoked. APCProc will be running in the context of the target thread and is responsible for modifying the SEH list to add in a new handler. Because of this, APCProc can obtain the TIB without any extra overhead code to write and the code basically becomes a copy/paste from part one.

void CALLBACK APCProc(ULONG_PTR dwParam)
{
    fprintf(stderr, "APC callback invoked. Raising exception to trigger exception handler.\n");
 
    EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
    fprintf(stderr, "Segment address of target thread: %X.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(STATUS_ACCESS_VIOLATION, 0, 0, nullptr);
}

The handler, NewHandler, being independent of all of this, doesn’t change much either.

EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    if (pExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION)
    {
        MessageBox(0, L"Some hidden functionality can go here.",
            L"Test", 0);
        return ExceptionContinueExecution;
    }
 
    return ExceptionContinueSearch;
}

Below are some screenshots of this at work on a 32-bit Notepad++ instance.

np1
Thread 5504 is chosen here.
np2The MessageBox in the exception handler successfully pops ups. Hitting the “OK” button resumes execution as normal.

The source for the projects (Visual Studio 2013, Update 4) presented in these parts can be found here. Thanks for reading and follow on Twitter for more updates.

April 3, 2015

Hiding Functionality with Exception Handlers (1/2)

Filed under: General x86,Programming,Reverse Engineering — admin @ 9:56 AM

This post will cover the topic of hiding functionality by taking advantage of the OS-supported exception handling provided by Windows. Namely, it will cover Structured Exception Handling (SEH), and how it can be utilized to obscure control flow at runtime and how it can make it more difficult to perform static analysis on a binary. Only the relevant parts of SEH will be covered here; the full details can be found on the MSDN page. Due to the differences in exception handling between Windows on x86 and x64, the general technique presented and accompanying code is relevant on x86 only. The code presented here will also discuss how to manually add exception records, without the use of the SetUnhandledExceptionFilter API. This technique as been seen in PE protectors, anti-intrusion bypass systems, and malware. As always, the material presented is for educational and research purposes; don’t do anything dumb/criminal with it.

Structured exception handling is best demonstrated through the use of the Microsoft extensions to C++ exception handling, namely using __try and __except statements. For example, the following code employs use of SEH:

__try
{
    printf("Hello, World!\n");
    int *pNull = nullptr;
    *pNull = 0x0BADC0DE;
}
__except (GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION)
{
    printf("In exception handler.\n");
}

Using SEH, the access violation arising from the null pointer dereference will be caught by the user defined handler; something not possible in standard C++. How this works is that at the beginning of the function, the compiler sets up the exception frame for this code. Viewing the disassembly for the function, it becomes more apparent how this happens.

00B21003 6A FF                push        0FFFFFFFFh  
00B21005 68 18 25 B2 00       push        0B22518h  
00B2100A 68 AC 10 B2 00       push        0B210ACh  
00B2100F 64 A1 00 00 00 00    mov         eax,dword ptr fs:[00000000h]  
00B21015 50                   push        eax  
00B21016 64 89 25 00 00 00 00 mov         dword ptr fs:[0],esp  

This code appears confusing at first, but can be cleared up by reading the crash course explanation page linked above. The code begins by pushing three values onto the stack. The two items in green will be ignored in the explanation and correspond to values in the exception record: the scope table and the try level. There are some obfuscation tricks to manipulating the scope table that can be done, but they won’t be discussed in this post. The full explanation of these fields and their purpose can be found on the crash course page. The next value, 0x0B210AC is an important one. Following this through in a debugger leads to the symbol __except_handler4.

This is the topmost handler of the exception chain and begins dispatching the exception. SEH works in such a way that the topmost exception handler in the chain is called and has a chance to handle the exception. If the exception is not handled, then the next entry in the exception chain is called until the exception is either handled or the final exception handler is called and the program aborts with an unhandled exception.

Afterwards, the value in FS:[0] is moved into the EAX register. FS:[0] contains the base address of a special Windows structure called the Thread Information Block (TIB). Among other things of interest, this structure contains a pointer to the current SEH frame at its base (+0x0). This value is then pushed onto the stack and the stack pointer at ESP is moved into FS:[0]. What is happening here is that an exception record structure is getting constructed on the stack and is being stored at the head of the SEH list. This allows for proper stack unwinding and exception handler call order in the event of an exception. The format of the exception record is documented on the crash course page and can be converted to a structure, with the irrelevant fields omitted, as follows:

typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl*)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
    //Missing fields here:
    //Scope table
    //Try level
    //EBP
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;

Now knowing the layout of these exception records and where to find them in memory, it is rather straightforward to modify the list. The steps are as follows:

  • Get the address of the TIB through the FS segment
  • Get a pointer to the current SEH frame from the TIB
  • Replace the head of the current SEH frame with a custom handler

Put into code, it looks like the following:

#include <cstdio>
#include <Windows.h>
 
typedef struct _EXCEPTION_REGISTRATION
{
    using pFncHandler = void (__cdecl *)(EXCEPTION_RECORD *, _EXCEPTION_REGISTRATION *,
        CONTEXT *, EXCEPTION_RECORD *);
 
    struct _EXCEPTION_REGISTRATION *pPrevHandler;
    pFncHandler pHandler;
 
} EXCEPTION_REGISTRATION, *PEXCEPTION_REGISTRATION;
 
//Base of TIB structure but we only care about exception chain.
EXCEPTION_REGISTRATION *pHandlerBase = (EXCEPTION_REGISTRATION *)__readfsdword(0x18);
 
EXCEPTION_DISPOSITION __cdecl MyTestHandler(EXCEPTION_RECORD *pExceptionRecord, void *pEstablisherFrame,
    CONTEXT *pContextRecord, void *pDispatcherContext)
{
    printf("Hello, World!\n");
 
    return ExceptionContinueExecution;
}
 
int main(int argc, char *argv[])
{
    fprintf(stderr, "TIB Base (Pointer to current SEH Frame): %p.\n", pHandlerBase);
 
    EXCEPTION_REGISTRATION NewHandler = { pHandlerBase->pPrevHandler,
        (EXCEPTION_REGISTRATION::pFncHandler)(MyTestHandler) };
 
    //Actually the pointer to first exception handler
    pHandlerBase->pPrevHandler = &NewHandler;
 
    RaiseException(0, 0, 0, nullptr);
 
    return 0;
}

Here a new handler, MyTestHandler, is added to the SEH chain. It gets invoked on the RaiseException call and tells the program to continue execution after printing out a string. Looking at the disassembly, there were no exception records generated for the code since it doesn’t use SEH, so the RaiseException call will appear to go to the unhandled exception filter and crash the application. However, the installation of the handler at runtime through the TIB prevents this and actually results in a call to somewhere unexpected. In addition to adding a new handler, it is also possible to replace an existing one.

Replacing entries in the SEH chain works on a per-thread basis. If the SEH list is modified on one thread and another thread raises an exception, the new SEH handler will not be called. Replacing SEH handlers for arbitrary threads and dispatching exceptions to run in their context will be the topic of the next post.

March 28, 2015

Thoughts on Modern C++

Filed under: Programming — admin @ 1:49 PM

I’ve recently finished reading Effective Modern C++, which is the continuation of the “Effective C++” series for C++11/14. The book covered most of the newer features of modern C++ along with sufficient code examples to show their usage and applications. Overall, I’m pretty excited to see some of these features getting adopted in current and future C++ code bases. The motivation for a lot of these features naturally stemmed from the reasons of performance and efficiency, as is to be expected with anything C++ related.

Features such as auto, nullptr, constexpr, alias declarations, override, default/delete declarations, lambdas, smart pointers, and other features allow for smaller, cleaner, and in the case of some, less bug-prone code. At the same time, type traits, noexcept, rvalue/universal references and their move/forward semantics, allow for more efficient code generation and run-time performance gains.

However, some of these features are not without their pitfalls. For example, the auto keyword has some tricky pitfalls due to the rules of modern C++ type deduction.

    int x = 123;
    auto y{ 123 };
 
    std::cout
        << "type of x: " << typeid(x).name()
        << std::endl
        << "type of y " << typeid(y).name()
        << std::endl;

the output of the following code is

type of x: int
type of y class std::initializer_list < int >

and

    std::vector boolVec = { true, false, false, true, true, false };
 
    bool bSecondElem = boolVec[1];
    auto bSecondElemAuto = boolVec[1];
 
    std::cout
        << "type of bSecondElem: " << typeid(bSecondElem).name()
        << std::endl
        << "type of bSecondElemAuto " << typeid(bSecondElemAuto).name()
        << std::endl;

outputs

type of bSecondElem: bool
type of bSecondElemAuto class std::_Vb_reference < struct std::_Wrap_alloc  > >

These were explained away by stating the different rules of auto type deduction versus template type dedication the beginning of the book, or the pitfalls of auto type deduction when dealing with classes such as std::vector < bool >, which save space by storing a bit per item and provide a reference to a proxy object when their operator[] is invoked. There are other notable edge cases, such as passing arguments through braced initializers to forwarding templates (Item 30, along with other issues), possible problems of dangling references from using default-capture lambdas (Item 31), and others. Even given these, I’m still excited to use these features (where appropriate) and see the benefits that modern C++ brings.

March 26, 2015

Malware Techniques: Code Streaming

Filed under: General x86,General x86-64,Programming — admin @ 8:59 PM

This quick post will cover the topic of code streaming. For example, take malware. One way for malware to hide and persist on a system is to not contain any malicious code. This is done by getting the malicious payload through an external source, such as a direct request to a web server, a Twitter/social media post, a Pastebin, or any other common mechanism. This code, usually encrypted or obfuscated in some way, is then mapped in to the malicious process and executed. After execution, the memory region is cleaned up and reused or reallocated in order to carry out further malicious functionality. The code for this functionality looks pretty straightforward:

MemoryExecutor::MemoryExecutor(const size_t ulAllocSize)
    : m_ulAllocSize{ ulAllocSize }
{
    m_pMemory = std::unique_ptr(new char[ulAllocSize]);
}
 
const bool MemoryExecutor::MapToRegion(const char * const pBytes, const size_t ulSize)
{
    if (ulSize > m_ulAllocSize)
    {
        m_ulAllocSize = ulSize;
        m_pMemory = std::unique_ptr((char *)std::realloc(m_pMemory.get(), m_ulAllocSize));
        if (m_pMemory.get() == nullptr)
        {
            return false;
        }
    }
 
#ifdef DEBUG
    std::memset(m_pMemory.get(), 0xCC, ulSize);
#endif
 
    std::memcpy(m_pMemory.get(), pBytes, ulSize);
 
    DWORD_PTR dwOldProtect = 0;
    return BOOLIFY(VirtualProtect(m_pMemory.get(), ulSize, PAGE_EXECUTE_READWRITE, (PDWORD)dwOldProtect));
}
 
void MemoryExecutor::ExecuteRegion()
{
    using pFnc = void (*)();
    pFnc pRuntimeFunction = (pFnc)m_pMemory.get();
    pRuntimeFunction();
 
    memset(m_pMemory.get(), 0, m_ulAllocSize);
    m_pMemory.release();
}

with the intention that pBytes in MapToRegion contains the malicious buffer. However, there are a few issues that come up, such as how to make WinAPI calls. The three solutions that I’ve seen to this come up in the wild are

  • Map position-independent shellcode that traverses the DLL list and manually implements GetProcAddress. This is done by accessing the PEB structure created for each process. The PEB structure contains a pointer to a PEB_LDR_DATA structure, which in turn contains three lists: load order, memory order, and initialization order. These three lists contain all of the DLLs loaded in to the process via their base address. Once a base address for the desired DLL is obtained by traversing the list, it is possible to find its export section and traverse the export table. For an x86 assembly implementation, see here. This technique, in a mix of x86 and C, was also used by me in demonstrating how to write a file packer here.
  • Set up registers/arguments and perform the native syscall. For example, the implementation of NtTerminateProcess on x64 looks like:
NtTerminateProcess:
00007FF998AE1040 4C 8B D1             mov         r10,rcx  
00007FF998AE1043 B8 2B 00 00 00       mov         eax,2Bh  
00007FF998AE1048 0F 05                syscall  
00007FF998AE104A C3                   ret  

where the code in red is the syscall number. This  approach is pretty volatile because syscall numbers can change across different Windows versions.

  • Get the addresses of the DLLs from within the malware, via GetModuleHandle, and fix up the addresses manually when they’re mapped. It’s pretty sloppy, but I’ve seen it before.

As far as stealth goes, something like the code above is pretty easy to detect. The idea of code executing off the heap (after allocating and changing the page permissions) does set off the red flags. Other implementations that I’ve seen have been to

  • Allocate executable pages upfront with VirtualAlloc. This is basically the same thing as above.
  • Locate empty blocks of memory in executable pages and map the code there. These empty blocks of memory usually occur due to alignment reasons in the code and can be exploited to store the malicious functionality. This approach is pretty convenient since the memory page(s) will already have the appropriate permissions, and when executed, won’t look as suspicious as when executing off the heap.
« Newer PostsOlder Posts »

Powered by WordPress