Accelerated Computing Libraries
Apr 19, 2024
Measuring the GPU Occupancy of Multi-stream Workloads
NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...
11 MIN READ
Apr 11, 2024
New Video Series: OpenUSD for Developers
Universal Scene Description, also called OpenUSD or USD, is an open and extensible framework for creating, editing, querying, rendering, collaborating, and...
3 MIN READ
Mar 27, 2024
Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools
NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....
14 MIN READ
Mar 25, 2024
Building High-Performance Applications in the Era of Accelerated Computing
AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...
6 MIN READ
Mar 08, 2024
cuTENSOR 2.0: Applications and Performance
While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...
9 MIN READ
Mar 08, 2024
cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations
NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...
17 MIN READ
Oct 22, 2023
Differentiable Slang: Example Applications
Differentiable Slang easily integrates with existing codebases—from Python, PyTorch, and CUDA to HLSL—to aid multiple computer graphics tasks and enable...
6 MIN READ
Oct 22, 2023
Differentiable Slang: A Shading Language for Renderers That Learn
NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...
12 MIN READ
Oct 05, 2023
Power Optimization with NVIDIA Jetson
When working with embedded systems such as the Jetson modules, you must optimize your application based on your power budget and compute resources. To avoid...
7 MIN READ
Oct 02, 2023
Accelerated Vector Search: Approximating with RAPIDS RAFT IVF-Flat
Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to...
15 MIN READ
Sep 25, 2023
New Video Series: CUDA Developer Tools Tutorials
GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....
3 MIN READ
Sep 07, 2023
NVIDIA CUDA Toolkit Symbol Server
NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...
3 MIN READ
Jun 29, 2023
Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer
Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...
14 MIN READ
Jun 27, 2023
GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell
Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...
13 MIN READ
Jun 12, 2023
Distributed Deep Learning Made Easy with Spark 3.4
Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep...
7 MIN READ
Jun 07, 2023
Predicting Credit Defaults Using Time-Series Models with Recursive Neural Networks and XGBoost
Today’s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may...
12 MIN READ