Simulation / Modeling / Design

Advanced API Performance: Mesh Shaders

A graphic of a computer sending code to multiple stacks.

This post covers best practices for mesh shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

Mesh shaders are a recent addition to the programmatical pipeline and aim to overcome the bottlenecks of the fixed layout used by the classical geometry pipeline. This post covers best practices for both DirectX and Vulkan developers.

Pipeline diagram shows geometry pipeline of 8 steps vs. mesh shader pipeline of 5 steps.
Figure 1. Mesh shader alternative to the geometry pipeline
  • When segmenting the data, use a value of 64 unique vertices and 126 triangle primitives, with intermediate sweet spots of 40 and 84. The emphasis here is to organize the implementation such that it is straightforward to experiment with different segmentation.
  • Reduce the payload size in amplification and mesh shaders as much as possible:
    • Use bit-packing and quantized representations
    • Replace attributes with barycentrics and allowing the Pixel Shader to fetch and interpolate the attributes
  • The Mesh and Amplification shader stages provide opportunities for LoD selection and further culling strategies. These can be achieved at various granularities, for example:
    • During the AS stage: cull clusters or make in-pipeline LoD decision
    • During the MS stage: cull individual primitives
  • If it’s straightforward, move decisions upfront and use deduced data that is available within the application. This can save a lot of work down the line. Keep in mind that it is not required to emulate more complex culling schemes, which the hardware does efficiently by default.
  • Rely on amplification shaders and mesh shaders when dealing with procedural instancing, such as hair or vegetation, iso-surfaces (fluid simulations, voxel data in medical imaging), assets obtained from 3D scans,  LoDs, and generally detailed models often encountered in CAD applications.
  • Take into consideration the topology connectivity of specialized meshes. Have separate implementations for handling dense topology compared to meshes exhibiting sparse topology, such as particles.
  • Be aware that the amplification shader stage adds overhead, although in general this is negligible.
  • For more information, see Using Mesh Shaders for Professional Graphics.


  • Compared to DX, the mesh shader in VK_NV_mesh_shader allows arbitrary read and write access to the mesh outputs, which are allocated upfront. You can gain performance by directly using or repurposing these outputs and avoid additional shared memory allocations.
  • Avoid large outputs from the amplification shader, as this can incur a significant performance penalty. Generally, we encourage a flexible implementation that allows for fine-tuning. With that in mind, there are a number of generic factors that impact performance:
    • Size of the payloads. The AS payload should preferably stay below 108 bytes, but if that is not possible, then keep it at least under 236 bytes.
    • Number of invocations of the amplification shader.
    • Number of mesh shaders emitted by the respective amplification shader (amplification rate).
  • Don’t attempt to emulate the fixed-function pipeline using amplification and mesh shaders, as this could potentially add redundancy.
  • Avoid segmentation in new meshlets every frame and look into baking this data offline, which allows for optimizations of the meshlets in space or vertex re-use.


Thanks to Jakub Boksansky for advice and feedback.

Discuss (0)