Tag: Algorithms

Accelerated Computing

Cooperative Groups: Flexible CUDA Thread Programming

In efficient parallel algorithms, threads cooperate and share data to perform collective computations. To share data, the threads must synchronize. 16 MIN READ
Accelerated Computing

Cutting Edge Parallel Algorithms Research with CUDA

Leyuan Wang, a Ph.D. student in the UC Davis Department of Computer Science, presented one of only two “Distinguished Papers” of the 51 accepted at Euro-Par… 14 MIN READ

Voting and Shuffling to Optimize Atomic Operations

2iSome years ago I started work on my first CUDA implementation of the Multiparticle Collision Dynamics (MPC) algorithm, a particle-in-cell code used to… 10 MIN READ

GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical… 9 MIN READ
Accelerated Computing

CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics

This post introduces warp-aggregated atomics, a useful technique to improve performance when many CUDA threads atomically update a single counter. 14 MIN READ

Faster Parallel Reductions on Kepler

Parallel reduction is a common building block for many parallel algorithms. A presentation from 2007 by Mark Harris provided a detailed strategy for… 12 MIN READ