Figure 1: The Tesla V100 Accelerator with Volta GV100 GPU. SXM2 Form Factor.
Using CUDA Warp-Level Primitives

Register Cache: Caching for Warp-Centric CUDA Programs

Cooperative Groups: Flexible CUDA Thread Programming

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

