These next two posts will cover the topic of hooking Windows system calls under WoW64, the Windows subsystem responsible for running x86 code on a 64-bit version of Windows. This post will be a brief introduction to system calls under WoW64 and how the transition between x86 and x64 occurs. It will lay the groundwork for the actual task (in part 2) of inserting a hook into this process to intercept desired system calls.
Anatomy of a System Call
On a native 64-bit version of windows, there are two ways that system calls get made: natively or through WoW64. The native case is rather straightforward. For example, take the following code snippet, compiled as x64, performing an NtWriteVirtualMemory syscall:
using pNtWriteVirtualMemory = NTSTATUS (NTAPI *)(HANDLE ProcessHandle,
PVOID BaseAddress, PVOID Buffer, ULONG NumberOfBytesToWrite,
PULONG NumberOfBytesWritten);
pNtWriteVirtualMemory NtWriteVirtualMemory = nullptr;
int main(int argc, char *argv[])
{
HMODULE hModule = GetModuleHandle(L"ntdll.dll");
NtWriteVirtualMemory = (pNtWriteVirtualMemory)GetProcAddress(hModule,
"NtWriteVirtualMemory");
int i = 0x321;
int j = 0x123;
fprintf(stderr, "j = %X\n", j);
ULONG numBytesWritten = 0;
NTSTATUS success = NtWriteVirtualMemory(GetCurrentProcess(), &j, &i,
sizeof(int), &numBytesWritten);
fprintf(stderr, "j = %X\n", j);
return 0;
} |
using pNtWriteVirtualMemory = NTSTATUS (NTAPI *)(HANDLE ProcessHandle,
PVOID BaseAddress, PVOID Buffer, ULONG NumberOfBytesToWrite,
PULONG NumberOfBytesWritten);
pNtWriteVirtualMemory NtWriteVirtualMemory = nullptr;
int main(int argc, char *argv[])
{
HMODULE hModule = GetModuleHandle(L"ntdll.dll");
NtWriteVirtualMemory = (pNtWriteVirtualMemory)GetProcAddress(hModule,
"NtWriteVirtualMemory");
int i = 0x321;
int j = 0x123;
fprintf(stderr, "j = %X\n", j);
ULONG numBytesWritten = 0;
NTSTATUS success = NtWriteVirtualMemory(GetCurrentProcess(), &j, &i,
sizeof(int), &numBytesWritten);
fprintf(stderr, "j = %X\n", j);
return 0;
}
The disassembly for the call looks like the following:
00007FF6C49D193A FF 15 C0 C6 00 00 call qword ptr [__imp_GetCurrentProcess (07FF6C49DE000h)]
00007FF6C49D1940 48 8D 4D 64 lea rcx,[numBytesWritten]
00007FF6C49D1944 48 89 4C 24 20 mov qword ptr [rsp+20h],rcx
00007FF6C49D1949 41 B9 04 00 00 00 mov r9d,4
00007FF6C49D194F 4C 8D 45 24 lea r8,[i]
00007FF6C49D1953 48 8D 55 44 lea rdx,[j]
00007FF6C49D1957 48 8B C8 mov rcx,rax
00007FF6C49D195A FF 15 00 A8 00 00 call qword ptr [NtWriteVirtualMemory (07FF6C49DC160h)]
You can see the first four arguments being put in to the RCX, RDX, R8, and R9 registers, per the standard calling convention for x64 on Windows. The fifth parameter is put onto the stack. When the call is made to NtWriteVirtualMemory, the following code is executed:
00007FFC88BB1560 4C 8B D1 mov r10,rcx
00007FFC88BB1563 B8 39 00 00 00 mov eax,39h
00007FFC88BB1568 0F 05 syscall
00007FFC88BB156A C3 ret
Here RCX (the first parameter) is moved into R10. Then 0x39 is moved in to EAX. Afterwards, the syscall instruction is executed, which handles the switch to kernel mode, where the actual call is carried out. The magic value of 0x39 is the syscall number corresponding to NtWriteVirtualMemory on x64 Windows 8.1. A very useful table of syscalls for x86 and x64 Windows versions can be found here. When the syscall finishes, it will return execution to the RET instruction, which subsequently returns execution back to the next instruction from the original call site.
x86 to x64 Transition
As a x86 process running under WoW64 on a 64-bit system, things change a bit. Looking at the x86 disassembly for the call initially shows nothing out of the ordinary:
00B51117 8D 45 EC lea eax,[numBytesWritten]
00B5111A 50 push eax
00B5111B 6A 04 push 4
00B5111D 8D 4D F4 lea ecx,[i]
00B51120 51 push ecx
00B51121 8D 55 F0 lea edx,[j]
00B51124 52 push edx
00B51125 FF 15 00 30 B5 00 call dword ptr ds:[0B53000h] ; GetCurrentProcess
00B5112B 50 push eax
00B5112C FF 15 18 40 B5 00 call dword ptr ds:[0B54018h] ; NtWriteVirtualMemory
However, looking at the call to NtWriteVirtualMemory shows the following:
77ECC810 B8 39 00 00 00 mov eax,39h
77ECC815 64 FF 15 C0 00 00 00 call dword ptr fs:[0C0h]
77ECC81C C2 14 00 ret 14h
77ECC81F 90 nop
Here the syscall number is moved into EAX as in the x64 example. Then a call to a special area of memory, the FS segment, is made. This segment contains thread local data and its address is unique per thread. Despite the segment base being unique per thread, the address contained at FS:[0xC0] will always be the same. This can quickly be verified with some test code showing the addresses of the FS segments and the contents of FS:[0xC0] across different threads.
DWORD WINAPI ThreadEntry(LPVOID lpParameter)
{
WaitForSingleObject((HANDLE)lpParameter, INFINITE);
return 0;
}
int main(int argc, char *argv[])
{
HANDLE hEvent = CreateEvent(nullptr, FALSE, FALSE, L"Useless event");
HANDLE hThread1 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
HANDLE hThread2 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
HANDLE hThread3 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
CONTEXT ctxThread1 = { CONTEXT_ALL };
(void)GetThreadContext(hThread1, &ctxThread1);
CONTEXT ctxThread2 = { CONTEXT_ALL };
(void)GetThreadContext(hThread2, &ctxThread2);
CONTEXT ctxThread3 = { CONTEXT_ALL };
(void)GetThreadContext(hThread3, &ctxThread3);
LDT_ENTRY ldtThread1 = { 0 };
LDT_ENTRY ldtThread2 = { 0 };
LDT_ENTRY ldtThread3 = { 0 };
(void)GetThreadSelectorEntry(hThread1, ctxThread1.SegFs, &ldtThread1);
(void)GetThreadSelectorEntry(hThread2, ctxThread2.SegFs, &ldtThread2);
(void)GetThreadSelectorEntry(hThread3, ctxThread3.SegFs, &ldtThread3);
NT_TIB *pTibMain = (NT_TIB *)__readfsdword(0x18);
DWORD_PTR dwFSBase1 = (ldtThread1.HighWord.Bits.BaseHi << 24) |
(ldtThread1.HighWord.Bits.BaseMid << 16) |
ldtThread1.BaseLow;
DWORD_PTR dwFSBase2 = (ldtThread2.HighWord.Bits.BaseHi << 24) |
(ldtThread2.HighWord.Bits.BaseMid << 16) |
ldtThread2.BaseLow;
DWORD_PTR dwFSBase3 = (ldtThread3.HighWord.Bits.BaseHi << 24) |
(ldtThread3.HighWord.Bits.BaseMid << 16) |
ldtThread3.BaseLow;
fprintf(stderr, "Thread 1 FS Segment base address: %X\n"
"Thread 2 FS Segment base address : %X\n"
"Thread 3 FS Segment base address : %X\n",
dwFSBase1, dwFSBase2, dwFSBase3);
DWORD_PTR dwWOW64Address1 = *(DWORD_PTR *)((unsigned char *)dwFSBase1 + 0xC0);
DWORD_PTR dwWOW64Address2 = *(DWORD_PTR *)((unsigned char *)dwFSBase2 + 0xC0);
DWORD_PTR dwWOW64Address3 = *(DWORD_PTR *)((unsigned char *)dwFSBase3 + 0xC0);
fprintf(stderr, "Thread 1 FS:[0xC0] : %X\n"
"Thread 2 FS:[0xC0] : %X\n"
"Thread 3 FS:[0xC0] : %X\n",
dwWOW64Address1, dwWOW64Address2, dwWOW64Address3);
return 0;
} |
DWORD WINAPI ThreadEntry(LPVOID lpParameter)
{
WaitForSingleObject((HANDLE)lpParameter, INFINITE);
return 0;
}
int main(int argc, char *argv[])
{
HANDLE hEvent = CreateEvent(nullptr, FALSE, FALSE, L"Useless event");
HANDLE hThread1 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
HANDLE hThread2 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
HANDLE hThread3 = CreateThread(nullptr, 0, &ThreadEntry, hEvent, 0, nullptr);
CONTEXT ctxThread1 = { CONTEXT_ALL };
(void)GetThreadContext(hThread1, &ctxThread1);
CONTEXT ctxThread2 = { CONTEXT_ALL };
(void)GetThreadContext(hThread2, &ctxThread2);
CONTEXT ctxThread3 = { CONTEXT_ALL };
(void)GetThreadContext(hThread3, &ctxThread3);
LDT_ENTRY ldtThread1 = { 0 };
LDT_ENTRY ldtThread2 = { 0 };
LDT_ENTRY ldtThread3 = { 0 };
(void)GetThreadSelectorEntry(hThread1, ctxThread1.SegFs, &ldtThread1);
(void)GetThreadSelectorEntry(hThread2, ctxThread2.SegFs, &ldtThread2);
(void)GetThreadSelectorEntry(hThread3, ctxThread3.SegFs, &ldtThread3);
NT_TIB *pTibMain = (NT_TIB *)__readfsdword(0x18);
DWORD_PTR dwFSBase1 = (ldtThread1.HighWord.Bits.BaseHi << 24) |
(ldtThread1.HighWord.Bits.BaseMid << 16) |
ldtThread1.BaseLow;
DWORD_PTR dwFSBase2 = (ldtThread2.HighWord.Bits.BaseHi << 24) |
(ldtThread2.HighWord.Bits.BaseMid << 16) |
ldtThread2.BaseLow;
DWORD_PTR dwFSBase3 = (ldtThread3.HighWord.Bits.BaseHi << 24) |
(ldtThread3.HighWord.Bits.BaseMid << 16) |
ldtThread3.BaseLow;
fprintf(stderr, "Thread 1 FS Segment base address: %X\n"
"Thread 2 FS Segment base address : %X\n"
"Thread 3 FS Segment base address : %X\n",
dwFSBase1, dwFSBase2, dwFSBase3);
DWORD_PTR dwWOW64Address1 = *(DWORD_PTR *)((unsigned char *)dwFSBase1 + 0xC0);
DWORD_PTR dwWOW64Address2 = *(DWORD_PTR *)((unsigned char *)dwFSBase2 + 0xC0);
DWORD_PTR dwWOW64Address3 = *(DWORD_PTR *)((unsigned char *)dwFSBase3 + 0xC0);
fprintf(stderr, "Thread 1 FS:[0xC0] : %X\n"
"Thread 2 FS:[0xC0] : %X\n"
"Thread 3 FS:[0xC0] : %X\n",
dwWOW64Address1, dwWOW64Address2, dwWOW64Address3);
return 0;
}
The output for the following code is
Thread 1 FS Segment base address: 7FDBB000
Thread 2 FS Segment base address : 7FDB8000
Thread 3 FS Segment base address : 7FC8F000
Thread 1 FS:[0xC0] : 77E81218
Thread 2 FS:[0xC0] : 77E81218
Thread 3 FS:[0xC0] : 77E81218
which verifies the original claim.
Moving back to the original disassembly; stepping into this CALL instruction leads to the following:
77E81216 00 00 add byte ptr [eax],al
77E81218 EA 84 1D E8 77 33 00 jmp 0033:77E81D84
77E8121F 00 00 add byte ptr [eax],al
Ignore some of the nonsense bytes — this is a result of the disassembly listing not being quite correct. The important instruction is the jump to 0x77E81D84. This is an inter-segment jump. Here the 0033 means a jump to 64-bit mode. The value prior here was 0023, which corresponds to x86. An interesting thing is that the (Visual Studio) debugger is unable to step into this address (x64 WinDbg can). Taking a look at where this address resides in memory reveals something interesting:It resides in wow64cpu.dll, which is one of the three core WoW64 DLLs that gets loaded into every process running under WoW64. That particular DLL is the one responsible for handling the transition from x86 to x64. Interestingly enough, the actual DLL is a 64-bit DLL that is loaded into a 32-bit process. The instructions, traced through to the syscall, at 0x77E81D84 are the following
wow64cpu!CpupReturnFromSimulatedCode
00000000`77e81d84 4987e6 xchg rsp,r14
00000000`77e81d87 458b06 mov r8d,dword ptr [r14] ds:00000000`008df5e0=77ecc78c
00000000`77e81d8a 4983c604 add r14,4
00000000`77e81d8e 4589453c mov dword ptr [r13+3Ch],r8d ds:00000000`007dfdec=77eea9b0
00000000`77e81d92 45897548 mov dword ptr [r13+48h],r14d ds:00000000`007dfdf8=008df670
00000000`77e81d96 4d8d5e04 lea r11,[r14+4]
00000000`77e81d9a 41897d20 mov dword ptr [r13+20h],edi ds:00000000`007dfdd0=00000000
00000000`77e81d9e 41897524 mov dword ptr [r13+24h],esi ds:00000000`007dfdd4=00000000
00000000`77e81da2 41895d28 mov dword ptr [r13+28h],ebx ds:00000000`007dfdd8=7f676000
00000000`77e81da6 41896d38 mov dword ptr [r13+38h],ebp ds:00000000`007dfde8=00000000
00000000`77e81daa 9c pushfq
00000000`77e81dab 4158 pop r8
00000000`77e81dad 45894544 mov dword ptr [r13+44h],r8d ds:00000000`007dfdf4=00000000
wow64cpu!TurboDispatchJumpAddressStart
00000000`77e81db1 8bc8 mov ecx,eax
00000000`77e81db3 c1e910 shr ecx,10h
00000000`77e81db6 41ff24cf jmp qword ptr [r15+rcx*8] ds:00000000`77e81b38=0000000077e822d0
wow64cpu!TurboDispatchJumpAddressEnd+0x516
00000000`77e822d0 418b5304 mov edx,dword ptr [r11+4] ds:00000000`008df5ec=00000000
00000000`77e822d4 458b13 mov r10d,dword ptr [r11] ds:00000000`008df5e8=008df5fc
00000000`77e822d7 eb3a jmp wow64cpu!TurboDispatchJumpAddressEnd+0x559 (00000000`77e82313)
wow64cpu!TurboDispatchJumpAddressEnd+0x559
00000000`77e82313 e838000000 call wow64cpu!TurboDispatchJumpAddressEnd+0x596 (00000000`77e82350)
wow64cpu!CpupSyscallStub
00000000`77e82350 0f05 syscall
00000000`77e82352 c3 ret
The entry point to the 64-bit code resides at the symbol CpupReturnFromSimulatedCode. This code is responsible for setting up the proper parameters and stack to perform the syscall and then call it. For a more full explanation of everything involved here, see this post.
That’s all that is involved as far as performing syscalls under WoW64. This article hopefully elucidated a few things about how x86 code can perform syscalls on a x64 system. With this baseline, the next article will cover what is involved in intercepting these syscalls.
Get the Code
The Visual Studio 2015 RC project for this example can be found here. The source code is viewable on Github here.
Follow on Twitter for more updates.