Voting and Shuffling to Optimize Atomic Operations

2iSome years ago I started work on my first CUDA implementation of the Multiparticle Collision Dynamics (MPC) algorithm, a particle-in-cell code used to… 10 MIN READ
GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical… 9 MIN READ
CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics

This post introduces warp-aggregated atomics, a useful technique to improve performance when many CUDA threads atomically update a single counter. 14 MIN READ