That wraps it up for how to create an extra-sensory perception (ESP) hack. Two important concepts were introduced in this series: the world-to-screen transformation, and hooking the underlying graphics API in order to draw information on the game’s screen. Both of these concepts are applicable outside of just developing ESP hacks. World-to-screen, although common in ESPs, can also be used in making bots for when you want your character to react to what is on the screen, i.e. moving towards, or away, from an enemy. Hooking the graphics API has tons of applications, including legitimate ones like drawing third-party overlays on the game’s screen.
The availability of the Source SDK code was a great help throughout both this series and in the creating an aimbot one. By having the source available, we were able to more easily reverse engineer the relevant interfaces and obtain pointers to them at runtime, and we were also able to lift the code responsible for performing the world-to-screen transformation.
The full source code for the ESP hack that was developed throughout this series is available on GitHub.
Now that we are at a point where we can get screen coordinates for an entity, the drawing part should be simple. We will start off with a very basic approach: drawing externally. This will be done by calling the DrawText function in GDI. We can create a function to achieve this as follows:
Here we find the game window, get the device context, and draw the text. We can write a function to loop through the entity list and draw the text above enemy entities.
void DrawEnemyEntityText() {
auto* serverEntity{ reinterpret_cast<IServerEntity*>(
GetServerTools()->FirstEntity()) };
if (serverEntity != nullptr) {
do {
auto* modelName{ serverEntity->GetModelName().ToCStr() };
if (modelName != nullptr) {
auto entityName{ std::string{GetEntityName(serverEntity)} };
if (IsEntityEnemy(entityName)) {
auto enemyEyePosition{ GetEyePosition(serverEntity) };
Vector2 screenPosition{};
auto shouldDraw{ WorldToScreen(enemyEyePosition, screenPosition) };
if (shouldDraw) {
DrawTextGDI(screenPosition, entityName);
}
}
}
serverEntity = reinterpret_cast<IServerEntity*>(
GetServerTools()->NextEntity(serverEntity));
} while (serverEntity != nullptr);
}
}
Calling this function in a loop and looking at the results, we see the following
The results look pretty good. However, if your refresh rate is high enough, there will be a very noticeable flicker in the text. What is happening is that the game’s rendering is conflicting with our drawing; we are trying to constantly draw something on the screen (our text) and the game engine is also trying to constantly draw something on the screen (the player’s view). There are a couple of ways to get around this: you can use SetWindowLongPtr to subclass the window and install a new window procedure. This will allow you to handle WM_PAINT messages and draw your text. Or you can create an entirely new window to draw on and keep it activated at the foreground, though this approach has problems with games running in full screen mode.
Ideally we would want to render our text the same way that the game renders its graphics. This is possible, but it will require additional work. To have the game render our text, we will need to hook in to the function that gets executed after rendering. In Direct3D9 games this is the EndScene function, and will be our target. Fortunately, finding this function is pretty easy. Since Microsoft ships symbols for d3d9.dll, we can attach a debugger, load the symbols, and get the address.
From here we can create a function pointer to it as we did earlier for the interfaces
DWORD_PTR GetEndSceneAddress() {
constexpr auto globalEndSceneOffset = 0x5C0B0;
auto endSceneAddress{ reinterpret_cast<DWORD_PTR>(
GetModuleHandle(L"d3d9.dll")) + globalEndSceneOffset };
return endSceneAddress;
}
Now that the function address is known, we can install a hook on it. To do this, I will be reusing the HookEngine class from an earlier article series. The hook logic will be pretty simple: we will call the DrawEntityEnemyText function as before and then call the original EndScene function.
HRESULT WINAPI EndSceneHook(IDirect3DDevice9* device) {
DrawEnemyEntityText(device);
using EndSceneFnc = HRESULT(WINAPI*)(IDirect3DDevice9* device);
auto original{ (EndSceneFnc)HookEngine::GetOriginalAddressFromHook(EndSceneHook) };
HRESULT result{};
if (original != nullptr) {
result = original(device);
}
return result;
}
In DrawEnemyEntityText, instead of calling DrawTextGDI, we will write a new function, DrawTextD3D9, which will draw text using Direct3D APIs.
The DrawTextD3D9 function looks very close to DrawTextGDI, but it performs its drawing on the game’s IDirect3DDevice9 device instead of directly on top of the window. As a result, we are rendering our text in the same rendering pipeline, and the text flicker will not be present. You can see the before and after below.
We now have a functional proof of concept that performs world-to-screen transformation of entities and displays text above their heads. The steps shown throughout this series are common to all ESP hacks and can be used as a reference in building more complex ones.
The first step for developing an ESP hack is being able to draw information on the screen at a specified position. While this seems like it should be easy at first, it is actually a rather complicated process. This is because objects exist in different vector spaces than the one that corresponds to your screen.
There is the local space, which puts the object’s center at the origin; world space, which is a common space that all objects live in; view space, which centers the camera at the origin and looks forward; clip space, which map objects to a fixed-size plane so that clipping can be done; and lastly, screen space, which is the window that is the two dimensional (x, y) space that corresponds to the window.
Going from one vector space to another involves multiplying the coordinates in one space by a transformation matrix to get the coordinates in the other space. For example, if you have coordinates in world space and you want to map them to view space, you would multiply those coordinates by a transformation matrix called a view matrix. Because matrix multiplication is associative, you can create a matrix that performs multiple transformations in one multiplication operation. This is where the world-to-screen matrix comes in: this matrix is a combination of a view matrix and a projection matrix whose purpose is to take three dimensional world space coordinates and map them to a two dimensional clip space. You can then transform the coordinates in the clip space to normalized device coordinates and adjust for the screen’s aspect ratio to get (x, y) coordinates in screen space.
Games will store some, or all, of these transformation matrices somewhere. Typically you will be able to find a world-to-screen matrix directly instead of having to find the view and projection matrices since a world-to-screen transformation happens so often. The process of reverse engineering a game to find these matrices is rather tedious: you will spend a lot of time panning your camera view around while scanning the process memory for values that you expect. For example, if you are looking directly down followed by directly up, you might expect the matrix to have values in [0, 1] or [-1, 1] within it. There are other ways such as trying to derive the matrix from your camera’s position and angles, and then scanning for that as well. All of these approaches are rather involved and feature a lot of trial and error; there can be an entire series of posts dedicated to reverse engineering view matrices in games.
This tedium will be removed in this series because the Source SDK is open source and provides an interface that allows users to get the world to screen matrix. As before, we will get a pointer to this interface and be able to access the WorldToScreenMatrix function. This is done by scanning for “VEngineClient014” in the referenced strings of the running game. After attaching a debugger and searching, we can find it pretty quickly
Looking at how it’s used, we can easily obtain the function pointer to the global interface
As in the previous series, we can construct the function to retrieve the interface as such
Now the fun can begin. We have the ability to get the world-to-screen matrix, but we don’t know how to perform the actual transformation. There is an example of the world-to-screen matrix being used to perform a transformation in the ScreenTransform function. This function passes in the world-to-screen matrix to the FrustumTransform function, which is the function that performs the actual transformation. The FrustumTransform function performs the matrix multiplication to transform between the vector spaces, and in this case, it will transform the input point in world space to a (x, y) position in clip space. To go from clip space to screen space, we need to see how ScreenTransform is called. Fortunately, there is a helpful function called GetVectorInScreenSpace that shows the viewport transformation to screen space.
For our purposes, we can lift these from the Source SDK and incorporate them into the ESP hack with minor modifications. FrustumTransformation will stay more or less as it was
and we can modify GetVectorInScreenSpace slightly to only return true for positions that are visible in screen space. The function has been renamed to WorldToScreen for more clarity as well
We can modify the aimbot code to print out the (x, y) screen coordinates of the closest enemy to the console to test this function out. After making the appropriate adjustments, we get the following results
which seem like legitimate numbers given the window size and resolution. The transformation appears to be successful! We are now able to translate an enemy’s world position to screen space. The first step in developing the ESP hack is done, now on to drawing.
Extra-Sensory Perception (ESP) hacks are a type of game hack that involve showing information to the player that they would not normally see. For example, these types of hacks might display an enemies’ position, their distance from the player, their health, what weapon they are using, and so on. They can also be more elaborate and change the enemies’ model to a more visible color, or draw a bounding box around them. This is all done with the purpose of providing the player with additional information that would not normally be visible to them. A few examples of ESP hacks are shown below which demonstrate this.
This next series of posts will go over how these hacks are created and provide a working proof of concept for an ESP targeting Half-Life 2. The series will start off by talking about a world-to-screen transformation: how models with a three dimensional position in the world get transformed to a two dimensional x and y coordinates system on your screen. Having established that, the series will then go over how to draw information on the game’s screen, and wrap up by showing an ESP proof of concept that draws some text over enemy models.
These posts will heavily leverage what was covered in the creating an aimbot series. It is recommended to read that first to get a better understanding of the Source SDK. Code from that series will also be heavily re-used for this series.