Understanding Structured Buffer Performance
Structured Buffers were a new addition to DirectX11. They offer expanded compute capabilities making them useful for techniques like tile based deferred shading. They offer a very convenient solution to representing data structures on the GPU that are more than simply colors or 4-component vectors. As such, they are a great tool to use in GPU programming.
As titles designed to a D3D11 baseline have recently become more common, we’ve noticed a pitfall that developers should be wary of and consider when developing their code. Structured Buffers are by definition tightly packed. This means that the following code generates a buffer with a stride of 20 bytes:
struct Foo
{
    float4 Position;
    float  Radius;
};
StructuredBuffer <Foo> FooBuf;
That may not seem terrible, but it does have some performance implications for your code that may not be immediately obvious. The fact that the structure is not naturally aligned to a 128-bit stride means that the Position element often spans cache lines, and that it can be more expensive to read from the structure. While one inefficient read from a structure is unlikely to damage your performance terribly, it can quickly explode. Something like a shader iterating over a list of complex lights, with more than 100 bytes of data per light, can be a serious pothole. In fact, we recently found prerelease code where whole-frame performance was penalized by over 5% by just such a difference.
To avoid these pitfalls in your own code, only a couple simple steps are required:
- Aim for structures with sizes divisible by 128 bits (sizeof float4)
- Pay attention to internal alignment so that vector types are ‘naturally’ aligned
After padding, struct with 32 byte stride:
struct Foo
{
    float4 Position;
    float  Radius;
    float pad0;
    float pad1;
    float pad2;
};
StructuredBuffer <Foo> FooBuf;
You might waste some memory or need slightly more complex code to accomplish these goals, but the costs are generally pretty small compared to the 20+% performance that can be lost on a shader from hitting these pitfalls.
If this topic was something you found interesting, please stay tuned as we have some additional structured buffer tips coming in a follow-up post.