CUDA Fortran

Nov 16, 2017
Pro Tip: Pinpointing Runtime Errors in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
4 MIN READ

Sep 29, 2015
Customize CUDA Fortran Profiling with NVTX
The NVIDIA Tools Extension (NVTX) library lets developers annotate custom events and ranges within the profiling timelines generated using tools such as the...
5 MIN READ

Sep 02, 2014
3 Versatile OpenACC Interoperability Techniques
OpenACC is a high-level programming model for accelerating applications with GPUs and other devices using compiler directives compiler directives to specify...
8 MIN READ

Aug 20, 2014
10 Ways CUDA 6.5 Improves Performance and Productivity
Today we're excited to announce the release of the CUDA Toolkit version 6.5. CUDA 6.5 adds a number of features and improvements to the CUDA platform, including...
7 MIN READ

Aug 13, 2014
Unified Memory: Now for CUDA Fortran Programmers
Unified Memory is a CUDA feature that we've talked a lot about on Parallel Forall. CUDA 6 introduced Unified Memory, which dramatically simplifies GPU...
3 MIN READ

Mar 05, 2014
CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
7 MIN READ

Jan 01, 2014
Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)
This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we...
12 MIN READ

Apr 01, 2013
Finite Difference Methods in CUDA Fortran, Part 2
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
6 MIN READ

Feb 26, 2013
Finite Difference Methods in CUDA Fortran, Part 1
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
9 MIN READ

Feb 07, 2013
An Efficient Matrix Transpose in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
8 MIN READ

Jan 15, 2013
Using Shared Memory in CUDA Fortran
In the previous post, I looked at how global memory accesses by a group of threads can be coalesced into a single transaction, and how alignment and stride...
11 MIN READ

Jan 03, 2013
How to Access Global Memory Efficiently in CUDA Fortran Kernels
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
8 MIN READ

Dec 11, 2012
How to Overlap Data Transfers in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
12 MIN READ

Nov 29, 2012
How to Optimize Data Transfers in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
12 MIN READ

Nov 15, 2012
How to Query Device Properties and Handle Errors in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
8 MIN READ

Nov 05, 2012
How to Implement Performance Metrics in CUDA Fortran
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...
9 MIN READ