Compilation

Oct 04, 2022

CUDA Toolkit 11.8 New Features Revealed

NVIDIA announces the newest CUDA Toolkit software release, 11.8. This release is focused on enhancing the programming model and CUDA application speedup...

4 MIN READ

Apr 06, 2021

N Ways to SAXPY: Demonstrating the Breadth of GPU Programming Options

Back in 2012, NVIDIAN Mark Harris wrote Six Ways to Saxpy, demonstrating how to perform the SAXPY operation on a GPU in multiple ways, using different...

9 MIN READ

Feb 12, 2021

Boosting Productivity and Performance with the NVIDIA CUDA 11.2 C++ Compiler

The 11.2 CUDA C++ compiler incorporates features and enhancements aimed at improving developer productivity and the performance of GPU-accelerated...

21 MIN READ

Feb 12, 2021

Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance...

14 MIN READ

Dec 16, 2020

Enhancing Memory Allocation with New NVIDIA CUDA 11.2 Features

CUDA is the software development platform for building GPU-accelerated applications, providing all the components needed to develop applications targeting...

9 MIN READ

PCAST helps to quickly isolate divergence between CPU and GPU results so you can isolate bugs or verify your results are OK even if they aren’t identical.

Nov 18, 2020

Detecting Divergence Using PCAST to Compare GPU to CPU Results

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. The first...

14 MIN READ

Nov 16, 2020

Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK

Fortran developers have long been able to accelerate their programs using CUDA Fortran or OpenACC. For more up-to-date information, please read Using Fortran...

13 MIN READ

Sep 11, 2019

NVDLA Deep Learning Inference Compiler is Now Open Source

Designing new custom hardware accelerators for deep learning is clearly popular, but achieving state-of-the-art performance and efficiency with a new design is...

6 MIN READ

Oct 25, 2017

High-Performance GPU Computing in the Julia Programming Language

Julia is a high-level programming language for mathematical computing that is as easy to use as Python, but as fast as C. The language has been created with...

10 MIN READ

Aug 01, 2017

Building Cross-Platform CUDA Applications with CMake

Cross-platform software development poses a number of challenges to your application’s build process. How do you target multiple platforms without maintaining...

10 MIN READ

Nov 07, 2016

New Compiler Features in CUDA 8

CUDA 8 is one of the most significant updates in the history of the CUDA platform. In addition to Unified Memory and the many new API and library features in...

17 MIN READ

Jul 13, 2015

Introducing the NVIDIA OpenACC Toolkit

Programmability is crucial to accelerated computing, and NVIDIA's CUDA Toolkit has been critical to the success of GPU computing. Over three million CUDA...

4 MIN READ

Jun 23, 2015

MapD: Massive Throughput Database Queries with LLVM on GPUs

Note: this post was co-written by Alex Şuhan and Todd Mostak of MapD. At MapD our goal is to build the world's fastest big data analytics and visualization...

12 MIN READ

Oct 08, 2014

The Next Wave of Enterprise Performance with Java, POWER Systems, and NVIDIA GPUs

The Java ecosystem is the leading enterprise software development platform, with widespread industry support and deployment on platforms like the IBM WebSphere...

9 MIN READ

Apr 22, 2014

Separate Compilation and Linking of CUDA C++ Device Code

Managing complexity in large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....

13 MIN READ

Jun 05, 2013

CUDA Pro Tip: Understand Fat Binaries and JIT Caching

As NVIDIA GPUs evolve to support new features, the instruction set architecture naturally changes. Because applications must run on multiple generations of...

6 MIN READ