Technical Walkthrough 0

Boosting Application Performance with GPU Memory Prefetching

NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, because GPUs also... 10 MIN READ
Technical Walkthrough 0

GPU Pro Tip: Fast Histograms Using Shared Atomics on Maxwell

Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. A histogram is a graphical... 9 MIN READ
Technical Walkthrough 0

CUDA Pro Tip: Do The Kepler Shuffle

When writing parallel programs, you will often need to communicate values between parallel threads. The typical way to do this in CUDA programming is to use... 2 MIN READ
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran.
Technical Walkthrough 0

Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)

This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we... 12 MIN READ
Technical Walkthrough 0

Finite Difference Methods in CUDA C++, Part 2

In the previous CUDA C++ post we dove in to 3D finite difference computations in CUDA C/C++, demonstrating how to implement the x derivative part of the... 6 MIN READ
Technical Walkthrough 0

Finite Difference Methods in CUDA Fortran, Part 2

[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can... 6 MIN READ