DEVELOPER BLOG

Tag: Cooperative Groups

HPC

Using CUDA Warp-Level Primitives

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by… 16 MIN READ
Accelerated Computing

Register Cache: Caching for Warp-Centric CUDA Programs

In this post we introduce the "register cache", an optimization technique that develops a virtual caching layer for threads in a single warp. It is a software… 16 MIN READ
Accelerated Computing

Cooperative Groups: Flexible CUDA Thread Programming

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. 16 MIN READ
Accelerated Computing

CUDA 9 Features Revealed: Volta, Cooperative Groups and More

The CUDA 9 release includes support for Volta GPUs, Cooperative Groups programming model extensions, faster libraries, and improved developer tools. 17 MIN READ