Note: This video may require joining the NVIDIA Developer Program or login

GTC Silicon Valley-2019 ID:S9839:Discovering the Turing T4 GPU Architecture with Microbenchmarks

Zhe Jia(Citadel)
We'll do a deep dive into previously undisclosed architectural details of NVIDIA's Turing T4 Cloud GPU, which we unearthed via micro-benchmarks, and compare the architecture's features with previous generations of NVIDIA GPUs. We'll also reveal the geometry and latency of Turing's complex memory hierarchy, the format of its encoded instructions, and the latency of instructions. Learn how developers can use this knowledge to design workloads that adapt exactly to the characteristics of the T4 GPU. We'll also explain how to manually assemble binary code that squeezes every bit of bare-metal performance from the hardware, which maximizes dual issues and avoids bank conflicts.

View the slides (pdf)