News 0

Announcing Nsight Deep Learning Designer 2021.1 - A Tool for Efficient Deep Learning Model Design and Development

NVIDIA announces Nsight DL Designer – the first in-class integrated development environment to support efficient design of deep neural networks for in-app… 3 MIN READ
Technical Walkthrough 0

NVIDIA Tools Extension API: An Annotation Tool for Profiling Code in Python and C/C++

As PyData leverages much of the static language world for speed including CUDA, we need tools which not only profile and measure across languages but also… 9 MIN READ
Image depicting NVIDIA CEO Jen-Hsun Huang explaining the importance of the RAPIDS launch demo at GTC Europe 2018.
Technical Walkthrough 0

Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high… 24 MIN READ
PCAST helps to quickly isolate divergence between CPU and GPU results so you can isolate bugs or verify your results are OK even if they aren’t identical.
Technical Walkthrough 0

Detecting Divergence Using PCAST to Compare GPU to CPU Results

Parallel Compiler Assisted Software Testing (PCAST) is a feature available in the NVIDIA HPC Fortran, C++, and C compilers. PCAST has two use cases. 14 MIN READ
Standard Parallellism in C++
Technical Walkthrough 0

Accelerating Standard C++ with GPUs Using stdpar

Historically, accelerating your C++ code with GPUs has not been possible in Standard C++ without using language extensions or additional libraries: CUDA C++… 19 MIN READ
Technical Walkthrough 0

How to Speed Up Deep Learning Inference Using TensorRT

Introduction to accelerated creating inference engines using TensorRT and C++ with code samples and tutorial links 22 MIN READ