Posts by Greg Ruetsch
Technical Walkthrough
Apr 15, 2021
Using Tensor Cores in CUDA Fortran
Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit)...
28 MIN READ
Technical Walkthrough
Nov 16, 2017
Pro Tip: Pinpointing Runtime Errors in CUDA Fortran
[caption id="attachment_2407" align="alignright" width="208"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
4 MIN READ
Technical Walkthrough
Mar 05, 2014
CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran
[caption id="attachment_8972" align="alignright" width="242"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
7 MIN READ
Technical Walkthrough
Jan 01, 2014
Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)
This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we...
12 MIN READ
Technical Walkthrough
Apr 01, 2013
Finite Difference Methods in CUDA Fortran, Part 2
[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
6 MIN READ
Technical Walkthrough
Feb 26, 2013
Finite Difference Methods in CUDA Fortran, Part 1
[caption id="attachment_8972" align="alignright" width="318"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
9 MIN READ