DEVELOPER BLOG

Tag: Shared Memory

HPC

GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical… 9 MIN READ
HPC

CUDA Pro Tip: Do The Kepler Shuffle

When writing parallel programs, you will often need to communicate values between parallel threads. The typical way to do this in CUDA programming is to use… 2 MIN READ
Accelerated Computing

Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)

This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we… 12 MIN READ
Accelerated Computing

Finite Difference Methods in CUDA C++, Part 2

In the previous CUDA C++ post we dove in to 3D finite difference computations in CUDA C/C++, demonstrating how to implement the x derivative part of the… 6 MIN READ
Accelerated Computing

Finite Difference Methods in CUDA Fortran, Part 2

In the last CUDA Fortran post we dove in to 3D finite difference computations in CUDA Fortran, demonstrating how to implement the x derivative part of the… 6 MIN READ
Accelerated Computing

Finite Difference Methods in CUDA C/C++, Part 1

In the previous CUDA C/C++ post we investigated how we can use shared memory to optimize a matrix transpose, achieving roughly an order of magnitude improvement… 9 MIN READ