This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.
With the recent Vulkan 1.3 release, it’s timely to add some Vulkan-specific tips that are not necessarily explicitly covered by the other Advanced API Performance posts. In addition to introducing new Vulkan 1.3 core features, this post shares a set of good practices for clearing and presenting surfaces.
Vulkan 1.3 Core
Vulkan 1.3 brings improvements through extensions to key parts in the API. This section summarizes our recommendations for obtaining the best performance when working with a number of these new features.
Recommended
- Skip framebuffer and render pass object setup by taking advantage of dynamic rendering.
- Reduce the number of pipeline state objects with core support for dynamic states.
- Simplify synchronization and avoid unnecessary image layout transitions by using the improved synchronization API
Clears
This section provides a guideline for achieving performance when invoking clear commands. This type of command clears a region within a color image or within the bound framebuffer attachments.
- Use
VK_ATTACHMENT_LOAD_OP_CLEAR
to clear attachments at the beginning of a subpass instead of clear commands. This can allow the driver to skip loading unnecessary data. - Outside of a render pass instance, prefer the usage of
vkCmdClearColorImage
instead of a CS invocation to clear images. This path enables bandwidth optimizations. - If possible, batch clears to avoid interleaving single clears between dispatches.
- Coordinate
VkClearDepthStencilValue
with the test function to achieve better depth testing performance:- 0.5 ≤ depth value < 1.0 correlates with
VK_COMPARE_OP_LESS_OR_EQUAL
- 0.0 ≤ depth value < 0.5 correlates with
VK_COMPARE_OP_GREATER_OR_EQUAL
- 0.5 ≤ depth value < 1.0 correlates with
Not recommended
- Specifying more than 30 unique clear values per application (or more than 15 on Turing) does not make the most of clear bandwidth optimizations.
- “Clear shaders” should be avoided unless there is overlap of a compute clear with a neighboring dispatch.
Present
The following section offers insight into the preferred way of using the presentation modes supported by a surface in order to achieve good performance.
Recommended
- Rely on
VK_PRESENT_MODE_FIFO_KHR
orVK_PRESENT_MODE_MAILBOX_KHR
(forVSync
on). Noteworthy aspects:VK_PRESENT_MODE_FIFO_KHR
is preferred as it does not drop frames and lacks tearing.VK_PRESENT_MODE_MAILBOX_KHR
may offer lower latency, but frames might be dropped.VK_PRESENT_MODE_FIFO_RELAXED_KHR
is compelling when your application only occasionally lags behind the refresh rate, allowing tearing so that it can “catch back up”.
- Rely on
VK_PRESENT_MODE_IMMEDIATE_KHR
forVSync
off. - On Windows systems, use the
VK_EXT_full_screen_exclusive
extension to bypass compositing. - Handle both out-of-date and suboptimal swapchains to re-create stale swapchains when windows resize, for example.
- For latency-sensitive applications, use the Vulkan Reflex SDK to minimize latency by completing game engine work just-in-time for rendering.
More information
For more information about using Vulkan with NVIDIA GPUs, see Vulkan Do’s and Don’ts.
To view the Vulkan API state, use the API Inspector in Nsight Graphics. (free download)
With Nsight Systems, you can view Vulkan usage on a unified CPU-GPU timeline, investigate stutter, and track GPU cold spots to their CPU origins. Download Nsight Systems for free.
Acknowledgments
Thanks to Piers Daniell, Ivan Fedorov, Adam Moss, Ryan Prescott, Joshua Schnarr, Juha Sjöholm, and Márton Tamás for their feedback and contributions.