Accelerated Computing Libraries
Aug 01, 2024
Just Released: CUDA Toolkit 12.6
The release supports GB100 capabilities and new library enhancements to cuBLAS, cuFFT, cuSOLVER, cuSPARSE, as well as the release of Nsight Compute 2024.3.
1 MIN READ
Jul 11, 2024
Next Generation of FlashAttention
NVIDIA is excited to collaborate with Colfax, Together.ai, Meta, and Princeton University on their recent achievement to exploit the Hopper GPU architecture and...
1 MIN READ
Apr 19, 2024
Measuring the GPU Occupancy of Multi-stream Workloads
NVIDIA GPUs are becoming increasingly powerful with each new generation. This increase generally comes in two forms. Each streaming multi-processor (SM), the...
11 MIN READ
Apr 11, 2024
New Video Series: OpenUSD for Developers
Universal Scene Description, also called OpenUSD or USD, is an open and extensible framework for creating, editing, querying, rendering, collaborating, and...
3 MIN READ
Mar 27, 2024
Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools
NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....
14 MIN READ
Mar 25, 2024
Building High-Performance Applications in the Era of Accelerated Computing
AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...
6 MIN READ
Mar 08, 2024
cuTENSOR 2.0: Applications and Performance
While part 1 focused on the usage of the new NVIDIA cuTENSOR 2.0 CUDA math library, this post introduces a variety of usage modes beyond that, specifically...
9 MIN READ
Mar 08, 2024
cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations
NVIDIA cuTENSOR is a CUDA math library that provides optimized implementations of tensor operations where tensors are dense, multi-dimensional arrays or array...
17 MIN READ
Oct 22, 2023
Differentiable Slang: Example Applications
Differentiable Slang easily integrates with existing codebases—from Python, PyTorch, and CUDA to HLSL—to aid multiple computer graphics tasks and enable...
6 MIN READ
Oct 22, 2023
Differentiable Slang: A Shading Language for Renderers That Learn
NVIDIA just released a SIGGRAPH Asia 2023 research paper, SLANG.D: Fast, Modular and Differentiable Shader Programming. The paper shows how a single language...
12 MIN READ
Oct 05, 2023
Power Optimization with NVIDIA Jetson
When working with embedded systems such as the Jetson modules, you must optimize your application based on your power budget and compute resources. To avoid...
7 MIN READ
Oct 02, 2023
Accelerated Vector Search: Approximating with RAPIDS cuVS IVF-Flat
Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to...
15 MIN READ
Sep 25, 2023
New Video Series: CUDA Developer Tools Tutorials
GPU acceleration is enabling faster and more intelligent applications than ever before, and the CUDA Toolkit is key to harnessing acceleration on NVIDIA GPUs....
3 MIN READ
Sep 07, 2023
NVIDIA CUDA Toolkit Symbol Server
NVIDIA has already made available a GPU driver binary symbols server for Windows. Now, NVIDIA is making available a repository of CUDA Toolkit symbols for...
3 MIN READ
Jun 29, 2023
Efficient CUDA Debugging: How to Hunt Bugs with NVIDIA Compute Sanitizer
Debugging code is a crucial aspect of software development but can be both challenging and time-consuming. Parallel programming with thousands of threads can...
14 MIN READ
Jun 27, 2023
GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell
Single-cell sequencing has become one of the most prominent technologies used in biomedical research. Its ability to decipher changes in the transcriptome and...
13 MIN READ