Advanced API Performance: Synchronization

Synchronization in graphics programming refers to the coordination and control of concurrent operations to ensure the correct and predictable execution of rendering tasks. Improper synchronization across the CPU and GPU can lead to slow performance, race conditions, and visual artifacts.

If running workloads asynchronously, make sure that they stress different GPU units. For example, pair bandwidth-heavy tasks with math-heavy tasks. That is, use z-prepass and BVH build or post-processing.
Always verify whether the asynchronous implementation is faster across the different architectures.
Asynchronous work can belong to different frames. Using this technique can help find better-paired workloads.
Wait and signal the absolute minimum of semaphores/fences. Every excessive semaphore/fence can introduce a bubble in a pipeline.
Use GPU profiling tools (NVIDIA Nsight Graphics in GPU trace mode, PIX, or GPUView) to see how well work overlaps and fences play together without stalling one queue or another.
To avoid extra synchronizations and resource barriers, asynchronous copy/transfer work can be done in compute queue.

Not recommended

Do not create queues that you don’t use.
- Each additional queue adds processing overhead.
- Multiple asynchronous compute queues will not overlap, due to the OS scheduler, unless hardware scheduling is enabled. For more information, see Hardware Accelerated GPU Scheduling.
Avoid tiny asynchronous tasks and group them if possible. Asynchronous workloads that take <0.2 ms are unlikely to show any benefits, as this is an approximate amount of time to resolve fences pre-hardware scheduling.
Avoid using fences to synchronize work within the queue. Command lists/buffers are guaranteed to be executed in order of submission within a command queue by specification.
Semaphores/fences should not be used instead of resource barriers. They are way more expensive and support different purposes.
Do not implement low-occupancy workloads with the intention to align them with more work on the graphics queue. GPU capabilities may change and low-occupancy work might become a long, trailing tail that stalls another queue.

Advanced API Performance: Synchronization

Recommended

Not recommended

Related resources

Tags

About the Authors

Advanced API Performance: Synchronization

Recommended

Not recommended

Related resources

Tags

About the Authors

Comments

Related posts

Measuring the GPU Occupancy of Multi-stream Workloads

Advanced API Performance: CPUs

Advanced API Performance: Command Buffers

Advanced API Performance: Async Copy

Advanced API Performance: Async Compute and Overlap

Related posts

Next-Generation Live Media Apps on Repurposable Clusters with NVIDIA Holoscan for Media

Speed Up Your AI Development: NVIDIA AI Workbench Goes GA

Upgrade Your Graphics: Explore New Ray Tracing Features for NVIDIA Nsight Tools

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics