AI / Deep Learning

Using Tensor Cores in CUDA Fortran

This blog describes a CUDA Fortran interface to this same functionality, focusing on the third-generation Tensor Cores of the Ampere architecture. 28 MIN READ
AI / Deep Learning

N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways… 9 MIN READ

Detecting Divergence Using PCAST to Compare GPU to CPU Results

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. 14 MIN READ

Building and Deploying HPC Applications using NVIDIA HPC SDK from the NVIDIA NGC Catalog

HPC development environments are typically complex configurations composed of multiple software packages, each providing unique capabilities. In addition to the… 17 MIN READ

Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. Now with the latest 20.11 release of the NVIDIA HPC SDK… 13 MIN READ

Accelerating Python on GPUs with nvc++ and Cython

The C++ standard library contains a rich collection of containers, iterators, and algorithms that can be composed to produce elegant solutions to complex… 11 MIN READ