API Hooking Techniques for GRID Cloud Game Streaming
NVIDIA GRID hardware and software provides a highly efficient and cost-effective infrastructure for cloud game streaming. With baremetal server configurations, API hooking is the key technique in the development of a full-fledged streaming solution. This technical blog will present several recommendations for this important technique.
Introduction
To facilitate cloud gaming powered by NVIDIA GRID, a software layer for managing multiple concurrent game sessions is needed. This can be implemented without requiring modifications to the application by intercepting API calls from the DirectX or OpenGL graphics API and adding the necessary GRID API calls for capture and encode. For this software layer we will refer to it as a SHIM layer. In this SHIM layer, a rendered image is grabbed from the rendering buffer and sent to the H.264 hardware encoder for video compression. API hooking is the key to implement the SHIM layer.

This blog shows several techniques for API hooking. This is not a tutorial, but a review of several possible approaches. For the reader, a good understanding of using DLL export functions and v-tables of C++ object is assumed. Also some programming experience and knowledge of API hooking will be very helpful in understanding these recommendations.
We use D3D9 as an example. While we only need to hook IDirect3DDevice9::Present() to get everything needed, the API cannot be hooked directly. The reason for this is because:
1. The IDirect3DDevice9 object is created from IDirect3D9::CreateDevice(), which we also need to hook in order to get the reference of the IDirect3DDevice9 object. Both IDirect3D9::Present() and IDirect3D9::CreateDevice() are COM object methods.
2. The IDirect3D9 object is in turn created from Direct3DCreate9(), which is an exported function of d3d9.dll. This also needs to be hooked too, and that is a DLL exported function.
In general, there are two kinds of API functions to hook: DLL exported functions and COM object methods. Let’s look in detail how to achieve that.
Hooking DLL exported functions
There are two approaches to hook into DLL exported functions:
(1a) Using hooking libraries such as Detours or Mhook. These libraries modify the API binary code function entry points that reside memory in order for intercepting API calls. The library will properly set up this API hooking when the game process is being loaded.
(1b) Replacing DLL file. We can create a proxy DLL, which exports all the functions of the original d3d9.dll, and have the same filename as d3d9.dll (same as the original one). The original d3d9.dll should be renamed (say, _d3d9.dll) so that its functions can still be loaded and called from our proxy DLL. The proxy DLL can be placed in the game’s DLL searchable path. This can be the system folder, game folder, or a path set by SetDllDirectory() so that our proxy will be loaded by the game upon launch.

Comparison:
Approach (1a) has the benefit for maximum OS portability, as we do not have to deal with different OS versions and updates separately. This also handles the case where there may be different DLL exported functions in different OS versions, allowing one common SHIM library that can be used for all OS versions. On the downside, it has slightly worse compatibility with some games. For some online games any modifications in memory will result in the game not running, as there are anti-cheat measures employed by the game. Approach (1a) is more flexible, however it won’t work for games that have anti-cheat measures.
Approach (1b) has the benefit that it can handle these games that have anti-cheat detection, but the cons to this are that it has inferior portability on different OS versions or OS updates. We will have to include all exported functions of every supported OS to maximize its portability. However, if approach (1b) is properly written to handle all these different OS versions and updates, it should work on every version of Windows.
Approach (1b) is not applicable for some special DLLs, such as user32.dll and ntdll.dll, which have fixed calling addresses for every exported function. It is very hard for the proxy DLL to implement interception of these DLLs (for our lack of software tools). For these scenarios, (1a) is the only solution.
Hooking COM object methods
To hook the methods of a COM object, we first need to hook the object creation function, such as Direct3DCreate9() for an IDirect3D9 object, or IDirect3D9::CreateDevice() for an IDirect3DDevice9 object. And in the object creation function, one of the following approaches can be used:
(2a) Using hooking libraries such as Detours or Mhook. The library should be setup when the target object is created.
(2b) Using a wrapper. We can create a wrapper object around the original COM object. In the wrapper object we provide all the methods as the original one, and return the wrapper object instead of the original object for the object creation. Usually, most of the methods are pass-through proxies to the original methods, except for the method we are really interested in, such as IDirect3DDevice9::Present().
(2c) Replacing v-table entries. We can replace the v-table entries, which are actually function pointers, with pointers to our proxy functions. (Refer to this link here for a simple code example.)
(2d) Replacing the v-table pointer. The v-table pointer of a COM object is the first data member of its memory layout. We can create a completely new v-table, and fill the table with pointers of original functions and our proxy functions, and we set the v-table pointer of the COM object to the address of this new v-table.

Comparison:
Approach (2a) and (2c) have slightly worse compatibility, similar to (1a).
Approach (2b) and (2d) can be applied to selected objects, but (2a) and (2c) are globally effective to all objects of the same class.
Approach (2b) potentially involves heavy coding efforts, especially if the COM object has many methods, since all methods should be implemented in the wrapper. But in (2a), (2c) and (2d) we can just hook the needed methods, and the code footprint is generally smaller.
Approach (2a), (2b) and (2c) has better compatibility, because the v-table structure of the original object is still retained. But approach (2d) builds a wholly new v-table for the object, which may lose the information that is contained in the original v-table but not documented in the interface description, so that it may cause the game to crash or run abnormally. For example, D3D9 COM objects are implemented in C instead of C++, and their v-tables are built by the programmer instead of the compiler. The v-tables contain more information than what the header file describes, and we are unable to create v-tables of full functionality as the original ones.