Simulation / Modeling / Design

Advanced API Performance: Barriers

A graphic of a computer sending code to multiple stacks.

This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

For the best performance on our hardware, here’s what you should and shouldn’t do when you’re using barriers with DX12 or Vulkan. This is updated from DX12 Do’s And Don’ts.

  • Minimize the use of barriers and fences. Any barrier or fence can limit parallelism. I’ve seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
    • The DX11 driver is doing a great job of reducing barriers. Under DX12, you must do it.
  • Make sure to always use the minimum set of resource usage flags. Redundant flags may trigger redundant flushes and stalls and slow down your game unnecessarily. Again, I’ve seen redundant or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
    • Stay away from using D3D12_RESOURCE_USAGE_GENERIC_READ unless you really need every single flag that is set in this combination of flags.
  • Specify the minimum set of targets in ID3D12CommandList::ResourceBarrier. Adding false dependencies adds redundancy.
  • Group barriers in one call to ID3D12CommandList::ResourceBarrier. This way, the worst case can be picked instead of sequentially going through all barriers.
  • Use a single NULL-to-NULL aliasing resource barrier rather than many (for example, 200+) resource-to-NULL barriers. It could be equal in the driver and processing all of them could waste CPU cycles. 
  • Use split barriers when possible.
    • Use the D3D12_RESOURCE_BARRIER_FLAG_BEGIN_ONLY and D3D12_RESOURCE_BARRIER_FLAG_END_ONLY flags. This helps the driver optimize scheduling the transition workloads.
  • Use fences to signal events or advance across calls to ExecuteCommandLists.
  • Don’t insert redundant barriers:
    • A transition from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_RENDER_TARGET and back without any draw calls in-between is redundant.
    • Avoid read-to-read barriers. Get the resource in the right state for all subsequent reads.
  • Don’t use D3D12_RESOURCE_USAGE_GENERIC_READ without good reason.
    • For transitions from write-to-read states, ensure the transition target is inclusive of all required read states needed before the next transition to write. This is done from the API by combining read state flags– and is preferred over transitioning from read-to-read in subsequent ResourceBarrier calls.
  • Don’t use the D3D12_RESOURCE_STATE_COMMON state for the non-initial states, unless it is absolutely needed. D3D12_RESOURCE_STATE_COMMON is a promotable state to both the read and write states so it makes the driver pick the worst synchronization metric.
Discuss (1)