This post covers best practices for barriers on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.
For the best performance on our hardware, here’s what you should and shouldn’t do when you’re using barriers with DX12 or Vulkan. This is updated from DX12 Do’s And Don’ts.
Recommended
- Minimize the use of barriers and fences. Any barrier or fence can limit parallelism. I’ve seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
- The DX11 driver is doing a great job of reducing barriers. Under DX12, you must do it.
- Make sure to always use the minimum set of resource usage flags. Redundant flags may trigger redundant flushes and stalls and slow down your game unnecessarily. Again, I’ve seen redundant or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
- Stay away from using
D3D12_RESOURCE_USAGE_GENERIC_READ
unless you really need every single flag that is set in this combination of flags.
- Stay away from using
- Specify the minimum set of targets in
ID3D12CommandList::ResourceBarrier
. Adding false dependencies adds redundancy. - Group barriers in one call to
ID3D12CommandList::ResourceBarrier
. This way, the worst case can be picked instead of sequentially going through all barriers. - Use a single NULL-to-NULL aliasing resource barrier rather than many (for example, 200+) resource-to-NULL barriers. It could be equal in the driver and processing all of them could waste CPU cycles.
- Use split barriers when possible.
- Use the
D3D12_RESOURCE_BARRIER_FLAG_BEGIN_ONLY
andD3D12_RESOURCE_BARRIER_FLAG_END_ONLY
flags. This helps the driver optimize scheduling the transition workloads.
- Use the
- Use fences to signal events or advance across calls to
ExecuteCommandLists
.
Not recommended
- Don’t insert redundant barriers:
- A transition from
D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE
toD3D12_RESOURCE_STATE_RENDER_TARGET
and back without any draw calls in-between is redundant. - Avoid read-to-read barriers. Get the resource in the right state for all subsequent reads.
- A transition from
- Don’t use
D3D12_RESOURCE_USAGE_GENERIC_READ
without good reason.- For transitions from write-to-read states, ensure the transition target is inclusive of all required read states needed before the next transition to write. This is done from the API by combining read state flags– and is preferred over transitioning from read-to-read in subsequent
ResourceBarrier
calls.
- For transitions from write-to-read states, ensure the transition target is inclusive of all required read states needed before the next transition to write. This is done from the API by combining read state flags– and is preferred over transitioning from read-to-read in subsequent
- Don’t use the
D3D12_RESOURCE_STATE_COMMON
state for the non-initial states, unless it is absolutely needed.D3D12_RESOURCE_STATE_COMMON
is a promotable state to both the read and write states so it makes the driver pick the worst synchronization metric.