Reading Between The Threads: Shader Intrinsics
GameWorks, GameWorks Expert Developer, DX11, DX12, Vulkan, OpenGL, nvapi, DesignWorks
Mathias Schott, posted Jul 29 2016
When writing compute shaders, it’s often necessary to communicate values between threads. This is typically done via shared memory. Kepler GPUs introduced “shuffle” intrinsics, which allow threads of a warp to directly read each other's registers avoiding memory access and synchronization. Shared memory is relatively fast but instructions that operate without using memory of any kind are significantly faster still.