Posts by Greg Ruetsch
Technical Walkthrough
Apr 15, 2021
Using Tensor Cores in CUDA Fortran
This blog describes a CUDA Fortran interface to this same functionality, focusing on the third-generation Tensor Cores of the Ampere architecture.
28 MIN READ
Technical Walkthrough
Nov 16, 2017
Pro Tip: Pinpointing Runtime Errors in CUDA Fortran
We’ve all been there. Your CUDA Fortran code is humming along and suddenly you get a runtime error: , , usually accompanied by in all caps. In many cases…
4 MIN READ
Technical Walkthrough
Mar 05, 2014
CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran
When dealing with small arrays and matrices, one method of exposing parallelism on the GPU is to execute the same cuBLAS call on multiple independent systems…
7 MIN READ
Technical Walkthrough
Jan 01, 2014
Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)
This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we…
12 MIN READ
Technical Walkthrough
Apr 01, 2013
Finite Difference Methods in CUDA Fortran, Part 2
In the last CUDA Fortran post we dove in to 3D finite difference computations in CUDA Fortran, demonstrating how to implement the x derivative part of the…
6 MIN READ
Technical Walkthrough
Feb 26, 2013
Finite Difference Methods in CUDA Fortran, Part 1
In the last CUDA Fortran post we investigated how shared memory can be used to optimize a matrix transpose, achieving roughly an order of magnitude improvement…
9 MIN READ