NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by taking advantage of warp execution. A new technical blog post shows how to use primitives introduced in CUDA 9 to make warp-level programing safe and effective. While the high performance obtained...