Technical Blog
Tag: Shared Memory
Subscribe
Technical Walkthrough
Mar 23, 2022
Boosting Application Performance with GPU Memory Prefetching
NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, because GPUs also...
10 MIN READ
Technical Walkthrough
Mar 17, 2015
GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell
Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical...
9 MIN READ
Technical Walkthrough
Feb 03, 2014
CUDA Pro Tip: Do The Kepler Shuffle
When writing parallel programs, you will often need to communicate values between parallel threads. The typical way to do this in CUDA programming is to use...
2 MIN READ
Technical Walkthrough
Jan 01, 2014
Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)
This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we...
12 MIN READ
Technical Walkthrough
Apr 08, 2013
Finite Difference Methods in CUDA C++, Part 2
In the previous CUDA C++ post we dove in to 3D finite difference computations in CUDA C/C++, demonstrating how to implement the x derivative part of the...
6 MIN READ
Technical Walkthrough
Apr 01, 2013
Finite Difference Methods in CUDA Fortran, Part 2
[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
6 MIN READ