Technical Walkthrough

Synchronizing Present Calls Between Applications on Distributed Systems with DirectX 12

Discuss (0)

Swap groups and swap barriers are well-known methods to synchronize buffer swaps between different windows on the same system and on distributed systems, respectively. Initially introduced for OpenGL, they were later extended through public NvAPI interfaces and supported in DirectX 9 through 12.

NVIDIA now introduces the concept of present barriers. They combine swap groups and swap barriers and provide a simple way to set up synchronized present calls within and between systems.

When an application requests to join the present barrier, the driver tries to set up either a swap group or a combination of a swap group and a swap barrier, depending on the current system configuration. The functions are again provided through public NvAPI interfaces.

The present barrier is only effective when an application is in a full-screen state with no window borders, as well as no desktop scaling or taskbar composition. If at least one of these requirements is not met, the present barrier disengages and reverts to a pending state until they all are. When the present barrier is in the pending state, no synchronization across displays happens.

Similarly, the present barrier works correctly only when displays are attached to the same GPU and set to the same timing. Displays can also be synchronized with either the Quadro Sync card or the NVLink connector.

Display synchronization occurs in one of two ways:

  • The displays have been configured to form a synchronized group or synchronized to an external sync source, or both, using the Quadro Sync add-on board.
  • The displays have been synchronized by creating a Mosaic display surface spanning the displays.

When the display timings have been synchronized through one of these methods, then the DX12 present barrier is available to use.

NvAPI interfaces

To set up synchronized present calls through the present barrier extension in NvAPI, the app must make sure that the present barrier is supported at all. If that’s the case, it must create a present barrier client, register needed DirectX resources, and join the present barrier.

Query present barrier support

Before any attempt to synchronize present calls, the application should first check whether present barrier synchronization is supported on the current OS, driver, and hardware configuration. This is done by calling the according function with the desired D3D12 device as a parameter.

ID3D12Device* device;
... // initialize the device
bool supported;
assert(NvAPI_D3D12_QueryPresentBarrierSupport(device, &supported) == NVAPI_OK);
if(supported) {
  LOG("D3D12 present barrier is supported on this system.");
  ...
}

Create a present barrier client handle

If the system offers present barrier support, the app can create a present barrier client by supplying the D3D12 device and DXGI swap chain. The handle is used to register needed resources, join or leave the present barrier, and query frame statistics.

IDXGISwapChain swapChain;
... // initialize the swap chain
NvPresentBarrierClientHandle pbClientHandle = nullptr;
assert(NvAPI_D3D12_CreatePresentBarrierClient(device, swapChain, &pbClientHandle) == NVAPI_OK);

Register present barrier resources

After client creation, the present barrier needs access to the swap chain’s buffer resources and a fence object for proper frame synchronization. The fence value is incremented by the present barrier at each frame and must not be changed by the app. However, the app may use it to synchronize command allocator usage between the host and device. The function must be called again whenever the swap chain’s buffers change.

ID3D12Fence pbFence; // the app may wait on the fence but must not signal it
assert(SUCCEEDED(device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&pbFence))));
ID3D12Resource** backBuffers;
unsigned int backBufferCount;
... // query buffers from swap chain
assert(NvAPI_D3D12_RegisterPresentBarrierResources(pbClientHandle, pbFence, backBuffers, backBufferCount) == NVAPI_OK);

Join the present barrier

After creating the present barrier client handle and registering the scanout resources, the application can join present barrier synchronization. Future present calls are then synchronized with other clients.

NV_JOIN_PRESENT_BARRIER_PARAMS params = {};
params.dwVersion = NV_JOIN_PRESENT_BARRIER_PARAMS_VER1;
assert(NvAPI_JoinPresentBarrier(pbClientHandle, &params) == NVAPI_OK);

Leave the present barrier

A similar function exists to leave present barrier synchronization. The client is left intact such that the app can easily join again.

assert(NvAPI_LeavePresentBarrier(pbClientHandle));

Application’s main loop

When everything is set up, the app can execute its main loop without any changes, including the present call. The present barrier handles synchronization by itself. While the app can choose to use the fence provided to the present barrier for host and device synchronization, it is also practical to use its own dedicated one.

Query statistics

While the client is registered to the present barrier, the app can query frame and synchronization statistics at any time to make sure that everything works as intended.

NV_PRESENT_BARRIER_FRAME_STATISTICS stats = {};
stats.dwVersion = NV_PRESENT_BARRIER_FRAME_STATICS_VER1;
assert(NvAPI_QueryPresentBarrierFrameStatistics(pbClientHandle, &stats) == NVAPI_OK);

The present barrier statistics object filled by the function call supplies several useful values.

  • SyncMode: The present barrier mode of the client from the last present call. Possible values:
    • PRESENT_BARRIER_NOT_JOINED: The client has not joined the present barrier.
    • PRESENT_BARRIER_SYNC_CLIENT: The client joined the present barrier but is not synchronized with any other clients.
    • PRESENT_BARRIER_SYNC_SYSTEM: The client joined the present barrier and is synchronized with other clients within the system.
    • PRESENT_BARRIER_SYNC_CLUSTER: The client joined the present barrier and is synchronized with other clients within the system and across systems.
  • PresentCount: The total count of times that a frame has been presented from the client after it joined the present barrier successfully.
  • PresentInSyncCount: The total count of times that a frame has been presented from the client and that has happened since the returned SyncMode is PRESENT_BARRIER_SYNC_SYSTEM or PRESENT_BARRIER_SYNC_CLUSTER. It resets to 0 if SyncMode switches away from those values.
  • FlipInSyncCount: The total count of flips from the client since the returned SyncMode is PRESENT_BARRIER_SYNC_SYSTEM or PRESENT_BARRIER_SYNC_CLUSTER. It resets to 0 if SyncMode switches away from those values.
  • RefreshCount: The total count of v-blanks since the returned SyncMode of the client is PRESENT_BARRIER_SYNC_SYSTEM or PRESENT_BARRIER_SYNC_CLUSTER. It resets to 0 if SyncMode switches away from those values.

Sample application

A dedicated sample app is available in the NVIDIA DesignWorks Samples GitHub repo. It features an adjustable and moving pattern of colored bars and columns to check visually for synchronization quality (Figure 1). The app also supports alternate frame rendering on multi-GPU setups and stereoscopic rendering. During runtime, it can join or leave the present barrier synchronization.

For source code and usage details, see the project at nvpro-samples/dx12_present_barrier.

Screenshot of a black screen with a vertical red bar, horizontal green bar, and statistics in the upper left corner.
Figure 1. Sample application with moving bars and lines, and real-time statistics.

Conclusion

Present barrier synchronization is an easy, high-level way to realize synchronized present calls on multiple displays, in both single system, and multiple distributed system scenarios. The interface is fully contained inside the NvAPI library and consists of only six setup functions while the complex management concepts are hidden from the user-facing code.