Content Creation / Rendering

Advanced API Performance: Pipeline State Objects

A graphic of a computer sending code to multiple stacks.

This post covers best practices when working with pipeline state objects on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

Pipeline state objects (PSOs) define how input data is interpreted and rendered by the hardware when submitting work to the GPUs. Proper management of PSOs is essential for optimal usage of system resources and smooth gameplay.

  • Create PSOs on worker threads asynchronously.
    • PSO creation is where shaders compilation and related stalls happen.
  • Start with generic PSOs with generic shaders that compile quickly and generate specializations later.
    • This gets you up and running faster even if you are not running the most optimal PSO or shader yet.
    • Shaders shared between PSOs will only compile once.
  • Avoid runtime PSO compilations as they most likely will lead to stalls.
    • The driver-managed shader disk cache may come to the rescue.
  • Use PSO libraries.
  • Use identical sensible defaults for don’t care fields wherever possible.
    • This allows for more possibilities for PSO reuse.
  • Use the /all_resources_bound / D3DCOMPILE_ALL_RESOURCES_BOUND compile flag if possible.
    • The compiler can do a better job at optimizing texture accesses. 
  • Arrange draw calls by PSO and tessellation usage.
  • Remember that PSO creation is where shaders are compiled and stalls are introduced.
    • It is really important to create PSO asynchronously and early enough before they are used.
    • Tread carefully with thread priorities for PSO compilation threads.
    • Use Idle priority if there is no ‘hurry’ to prevent slowdowns for game threads.
    • Consider temporarily boosting priorities when there is a ‘hurry.’
  • Toggling between compute and graphics on the same command queue more than necessary.
    • This is still a heavyweight switch to make.
  • Toggling tessellation on/off more than necessary.
    • This is also a heavyweight switch to make.
  • Using FXC to generate DXBC in DX12.
    • This causes extra DXBC to DXIL translation, increasing compilation time and PSO library size.
  • Serializing large (hundreds of thousands) numbers of PSOs to disk in PSO libraries at once.
    • This may significantly bloat the usage of system memory.
    • Use the “miss and update the PSO library” strategy instead.

Acknowledgments

Thanks to Patrick Neil and Dhiraj Kumar for their advice and assistance.

Discuss (0)

Tags