Using Nsight Systems for Fixing Stutters in Games

While working with game developers on pre-release games, NVIDIA has had a steady flow of bugs reported where a game stutters for multiple milliseconds during gameplay. These stutter bugs can ruin the experience of the gamer, possibly making the game unplayable (as with the release of Batman Arkham Knight on PC), so they should be treated with a high priority.

Until 2018, the only tool that NVIDIA DevTech recommended for finding the root-cause of stutters on Windows was GPUView, which ships with the Windows Performance Toolkit.

On the other hand, Nsight Systems can capture and visualize ETW data from DX11, DX12/DXR, VK/VKR, and GL applications, similar to what GPUView can do. It lets you see all the CPU threads that are executing in each frame and what each thread is doing on a regular basis (using stackwalk events), just like GPUView.

On top of that, it lets you easily locate slow frames (by highlighting the frames that have a frame time greater than a given target). You can also locate stutter frames (by highlighting the frames that have a frame time much greater than the surrounding frames). It also captures accumulated CPU times per frame for certain API calls that we know are likely to cause stutters, and it reports these times in the Frame Health row, under the CPU frame duration row. 

In this post, I show how the tool can be used to find the root cause of a stutter bug that we had to fix during the integration of ray-traced reflections in Wolfenstein: Youngblood (VKR).

Capturing a Nsight Systems trace

When you launch Nsight Systems, the tool creates an empty project and you see the form in Figure 1.

Figure 1. Configuring the tool to collect data locally.

To capture data from your local computer, choose Localhost connection. After you’ve chosen Select, the tool sets itself up (which may take a few seconds) and then displays this form that matches the capabilities of your operating system, as shown in Figure 2.

Figure 2. Configuring the data collection.

At this point, we recommend the following steps:

  1. Enter the path to your application EXE file.
  2. To specify any EXE arguments, choose Edit Arguments
  3. Set the working directory.
  4. Leave the checkboxes that are already set.
  5. Keep Collect WDDM trace enabled.
  6. If your app is using Vulkan, enable Collect Vulkan trace.
  7. If your app is using DX12, enable Collect DX12 trace.
  8. Enable Start Profiling Manually and Hotkey (F12) Start/Stop.

When you’ve done all this, choose Start and the tool then launches your application. To start capturing a trace, press F12, repro the stutter bug, and then press F12 again to stop capturing the trace. The tool then immediately starts post-processing the data, which may take a while depending on how long your trace was.

To avoid long post-processing and trace-loading times, I recommend that your traces be no more than one minute long.

Currently, a single trace capture can be captured per application run.

Analyzing a trace

After the post-processing of the capture is complete, the trace is saved to disk as a .qdrep file and the tool opens the trace automatically.

Zooming on a stutter frame

Here’s a trace that was generated from a gameplay session where a >10ms stutter happened in a frame while the user was walking around in a map. When the tool opens the trace, not much is visible, as shown in Figure 3.

Figure 3. Initial view of the trace with the Process data folded.

In the left navigation pane, expand Processes, Frame duration, and CPU frame duration for more information, as shown in Figure 4. The CPU frame duration is the frame time measured on the CPU, from end of Present to end of Present. The GPU frame duration (on the row that contains the GPU name) is the frame time measured from the beginning of the first executed GPU command to the end of the last executed GPU command. Frames with time > target are highlighted in red on the Frame Duration row. Stutter frames are indicated in red on the stutter sub-row.

Figure 4. Frames with time > target are highlighted in red.

The Frame duration row shows a few frames in red. Let’s zoom on the first one. To do this, we can select a time range in the timeline that contains the first red frame, and then drag and drop the left mouse cursor to get this context menu.

Figure 5. Zooming in on a time range.

By choosing Filter and Zoom in, the tool zooms in on the selected time range, as shown in Figure 6.

Figure 6. After zooming in on the neighborhood of a stutter frame.

Let’s now zoom on the red frame (#753) by CTRL-double-clicking on it:

Figure 7. After zooming on the stutter frame.

Identifying the root-cause of the stutter

In the Frame Health row, hover over the red box. We see the ray-tracing accumulated time in the frame, as shown in Figure 8:

Figure 8. Tooltip on the Frame Health row in the stutter frame.

The tooltip can be copied and pasted to text by right-clicking the Frame Health row and selecting the event under Events View in the bottom pane.

So we now know that the root cause of the stutter is calling vkCreateRayTracingPipelinesNV or vkGetAccelerationStructureMemoryRequirementsNV, which add up to 30.8 ms of CPU time in this frame, as measured using instrumented CPU timers for these API calls.

Now, expand all threads. Some less active threads globally may be collapsed into a row labeled N thread hidden. To review those threads, we can double-click that text. Figure 9 shows the long-pole thread in this view:

This screenshot shows the long-pole thread.
Figure 9. The long-pole thread, under Process, Threads.

We can now expand that long-pole thread’s details by choosing the triangle icon to the left of the thread label ([2032] WorkerThread_21). We can now clearly see that the root cause of the long-pole thread that caused the stutter is a call to vkCreateRayTracingPipelinesNV, as shown in Figure 10:

Figure 10. After having expanded the Vulkan API row of the long-pole thread.

If we right-click the Vulkan API row and choose Show In Events View, we see:

Figure 11. Using the Events View on the Vulkan API row of the long-pole thread.

Sorting the events by duration in decreasing order shows the following results, as in Figure 12:

Figure 12. After having sorted the Vulkan API events by duration.

The conclusion is that the Vk API event with the longest duration in this frame is this ray-tracing pipeline creation call taking 30.8 ms of CPU time:

Call to: vkCreateRayTracingPipelines
NV Vulkan API calls
Begins: 12.5591s
Ends: 12.5899s (+30.809 ms)
Flags: NONE
Duration: 30.805 ms

Ray-tracing pipeline creation calls can take a high amount of CPU time because they involve the cost of compiling and linking all the shaders specified in the pipeline. In this case, the fix was simple: at engine initialization time, pre-create all permutations of the ray-tracing pipelines that could possibly be used during gameplay.

Conclusion

Nsight Systems is a powerful tool for determining the root causes of stutters in games. Like GPUView, it can be used to triage any situation where the GPU is idle during a frame due to not being fed data fast enough by the CPU, or due to some graphics API calls triggering expensive CPU work in the driver or operating system.

Finally, unlike GPUView which is available for Windows only, Nsight Systems supports tracing of Vulkan and GL on Linux too. For more information, see the NVIDIA Nsight Systems product page, Nsight Systems – 2019.6 Direct3D11 Feature spotlight video, and other Developer Blog posts such as NVIDIA Nsight Systems Adds Vulkan Support.

Acknowledgements

I want to thank Daniel Horowitz and the Nsight Systems team for their hard work on making this tool more useful to game developers.