CUDA Fortran

Nov 16, 2017

Pro Tip: Pinpointing Runtime Errors in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

4 MIN READ

Sep 29, 2015

Customize CUDA Fortran Profiling with NVTX

The NVIDIA Tools Extension (NVTX) library lets developers annotate custom events and ranges within the profiling timelines generated using tools such as the...

5 MIN READ

Sep 02, 2014

3 Versatile OpenACC Interoperability Techniques

OpenACC is a high-level programming model for accelerating applications with GPUs and other devices using compiler directives compiler directives to specify...

8 MIN READ

Aug 20, 2014

10 Ways CUDA 6.5 Improves Performance and Productivity

Today we're excited to announce the release of the CUDA Toolkit version 6.5. CUDA 6.5 adds a number of features and improvements to the CUDA platform,...

7 MIN READ

Aug 13, 2014

Unified Memory: Now for CUDA Fortran Programmers

Unified Memory is a CUDA feature that we've talked a lot about on Parallel Forall. CUDA 6 introduced Unified Memory, which dramatically simplifies GPU...

3 MIN READ

Mar 05, 2014

CUDA Pro Tip: How to Call Batched cuBLAS routines from CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

7 MIN READ

Jan 01, 2014

Peer-to-Peer Multi-GPU Transpose in CUDA Fortran (Book Excerpt)

This post is an excerpt from Chapter 4 of the book CUDA Fortran for Scientists and Engineers, by Gregory Ruetsch and Massimiliano Fatica. In this excerpt we...

12 MIN READ

Apr 01, 2013

Finite Difference Methods in CUDA Fortran, Part 2

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

6 MIN READ

Feb 26, 2013

Finite Difference Methods in CUDA Fortran, Part 1

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

9 MIN READ

Feb 07, 2013

An Efficient Matrix Transpose in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

8 MIN READ

Jan 15, 2013

Using Shared Memory in CUDA Fortran

In the previous post, I looked at how global memory accesses by a group of threads can be coalesced into a single transaction, and how alignment and...

11 MIN READ

Jan 03, 2013

How to Access Global Memory Efficiently in CUDA Fortran Kernels

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

8 MIN READ

Dec 11, 2012

How to Overlap Data Transfers in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

12 MIN READ

Nov 29, 2012

How to Optimize Data Transfers in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

12 MIN READ

Nov 15, 2012

How to Query Device Properties and Handle Errors in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

8 MIN READ

Nov 05, 2012

How to Implement Performance Metrics in CUDA Fortran

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can...

9 MIN READ