Accelerated Computing Libraries

Apr 19, 2024

Measuring the GPU Occupancy of Multi-stream Workloads

NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...

11 MIN READ

Apr 11, 2024

New Video Series: OpenUSD for Developers

Universal Scene Description, also called OpenUSD or USD, is an open and extensible framework for creating, editing, querying, rendering, collaborating, and...

3 MIN READ

Decorative image of bugs crawling over a computer chip.

Mar 27, 2024

Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools

NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....

14 MIN READ

Mar 25, 2024

Building High-Performance Applications in the Era of Accelerated Computing

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...

6 MIN READ

Decorative image of matrices on a black background, with the text, "Part 2."

Mar 08, 2024

cuTENSOR 2.0: Applications and Performance

While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...

9 MIN READ

Decorative image of matrices on a black background, with the text "Part 1."

Mar 08, 2024

cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations

NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...

17 MIN READ

Decorative image of green transparent cube with tiered white lights inside.

Oct 22, 2023

Differentiable Slang: Example Applications

Differentiable Slang easily integrates with existing codebases—from Python, PyTorch, and CUDA to HLSL—to aid multiple computer graphics tasks and enable...

6 MIN READ

Oct 22, 2023

Differentiable Slang: A Shading Language for Renderers That Learn

NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...

12 MIN READ

Oct 05, 2023

Power Optimization with NVIDIA Jetson

When working with embedded systems such as the Jetson modules, you must optimize your application based on your power budget and compute resources. To avoid...

7 MIN READ

Oct 02, 2023

Accelerated Vector Search: Approximating with RAPIDS RAFT IVF-Flat

Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to...

15 MIN READ

Sep 25, 2023

New Video Series: CUDA Developer Tools Tutorials

GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....

3 MIN READ

Decorative image of two boxes with libcuda.sym labels.

Sep 07, 2023

NVIDIA CUDA Toolkit Symbol Server

NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...

3 MIN READ

Stylized image of a beetle on lines of code.

Jun 29, 2023

Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer

Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...

14 MIN READ

Jun 27, 2023

GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell

Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...

13 MIN READ

Jun 12, 2023

Distributed Deep Learning Made Easy with Spark 3.4

Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep...

7 MIN READ

Person holding a smartphone and a credit card over some papers.

Jun 07, 2023

Predicting Credit Defaults Using Time-Series Models with Recursive Neural Networks and XGBoost

Today’s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may...

12 MIN READ