NVIDIA CUDA-X Libraries
Built on the foundation of NVIDIA® CUDA®, NVIDIA CUDA-X™ is a powerful suite of libraries designed to deliver industry-leading GPU acceleration across AI and high-performance computing use cases—from generative AI and autonomous machines to climate modeling and financial forecasting. Whether you're deploying on resource-constrained IoT devices or the world's largest supercomputers, NVIDIA CUDA-X libraries provide highly optimized implementations of complex algorithms that far outperform CPU-only alternatives. For developers looking to build or scale applications, CUDA-X offers the most efficient and accessible path to maximizing hardware performance across any domain.
Components
CUDA Math Libraries
GPU-accelerated math libraries lay the foundation for compute-intensive applications in areas such as molecular dynamics, computational fluid dynamics, computational chemistry, medical imaging, and seismic exploration.
nvmath-python
Enabling GPU-accelerated math operations for the Python ecosystem. nvmath-python (Beta) is an open source library that provides high-performance access to the core mathematical operations in the NVIDIA math libraries.
Learn More
Scientific Computing Libraries
For applications requiring neural networks that respect mathematical symmetries—specifically, equivariance with respect to 3D geometric data, such as molecular structures, proteins, and materials.
cuEquivariance
An open source Python library designed to accelerate the construction and execution of geometry-aware neural networks, particularly those handling data rotations and translations in 3D space.
Learn MoreNVIDIA ALCHEMI
A collection of domain-specific NVIDIA NIM™ microservices and toolkit for accelerating chemical and materials discovery (e.g., battery materials, catalysts, OLEDs, beauty formulations).
Learn MorecuLitho
Targeting the modern-day challenges of nanoscale computational lithography, this library optimizes tools and algorithms to accelerate computational lithography and the manufacturing of semiconductors using GPUs.
Learn MorecuEST
Accelerates industrial-scale quantum chemistry via a flexible API and high-performance building blocks for first-principles electronic structure calculations on GPUs.
Learn MorePhysics Libraries
GPU-accelerated physics libraries and frameworks to speed up simulations across domains including computational physics, multiphysics, quantum physics, and weather modeling.
NVIDIA Warp
A purpose-built, open source Python framework that delivers GPU acceleration for computational physics, AI, and optimization workflows, enabling kernel-based programs for simulation AI, robotics, and ML.
NVIDIA PhysicsNeMo
An open source Python framework for building, training, and fine-tuning AI physics models at scale.
NVIDIA Earth-2
A comprehensive family of open models, libraries, and frameworks that democratize global access to professional-grade weather and climate AI.
Quantum Computing Libraries
Enabling simulation, HPC integration and AI for quantum computing.
cuQuantum
A set of highly optimized libraries for accelerating quantum computing simulations.
cuPQC
SDK of optimized libraries for accelerating post-quantum cryptography (PQC) workflows.
CUDA-Q QEC
Libraries for simulating and implementing noise-resilient quantum algorithms and error mitigation.
CUDA-Q Solvers
GPU-accelerated solvers for hybrid quantum-classical optimization and variational workloads.
Deep Learning Core Libraries
GPU-accelerated libraries for deep learning applications that use CUDA and specialized hardware components of GPUs.
NVIDIA cuDNN
GPU-accelerated library of deep neural network building blocks ("primitives") for deep learning.
NVIDIA TensorRT™ and TensorRT LLM
High-performance deep learning inference optimizer and runtime for production deployment.
CUTLASS
Modular C++ templates and Python DSLs for building high-performance kernels targeting NVIDIA Tensor Cores.
FlashInfer
GPU-accelerated kernel library, accessible via Python API for inference, optimizing attention, MoEs, GEMMs, comms, and other neural network operations.
Parallel Algorithm Libraries
The CUDA Core Compute Libraries (CCCL) provide GPU-accelerated algorithms in C++ and Python. They provide optimized parallel primitives to solve complex challenges in natural sciences, logistics, travel planning, and more.
Thrust
Powerful data-parallel library based on the C++ STL that lets developers implement complex GPU-accelerated algorithms in a high-level API without sacrificing performance.
CUB
Cooperative primitives for CUDA kernel authoring, providing warp-wide, block-wide, and device-wide collective primitives across the CUDA programming model.
cuda.compute
Pythonic interface for high-performance, device-level CCCL algorithms, enabling developers to leverage CUDA parallel processing directly within Python workflows.
cuda.parallel
Standardized primitives for distributed and local parallel patterns such as sort, scan, and reduction, optimized for the latest NVIDIA GPU architectures.
Data Processing Libraries
GPU-accelerated libraries to accelerate data processing workflows for tabular, text, and image data.
cuDF
Accelerate tabular data, including pandas, Polars, and Apache Spark with zero code changes.
cuVS
Accelerate vector search for data mining and semantic search applications—including world-class performance from the GPU-native nearest neighbors algorithm CAGRA.
cuML
Speed up ML algorithms in scikit-learn, UMAP, HDBSCAN, and Apache Spark with zero code changes.
cuOpt
Open source, GPU-accelerated decision optimization engine designed to tackle large-scale problems with millions of variables and constraints, enabling accelerated decision-making.
cuGraph
Scale up and speed up graph analytics with GPU-accelerated NetworkX.
NeMo Curator
Improves generative AI model accuracy by processing text, image, and video data at scale for training and customization, with pre-built pipelines for generating synthetic data.
Morpheus
Open application framework that optimizes cybersecurity AI pipelines for analyzing large volumes of real-time data.
nvComp
High-throughput GPU-accelerated compression and decompression library that minimizes storage footprint and speeds up data transfer rates for AI training, HPC, data science, and analytics applications.
GPU Direct Storage
NVIDIA GPUDirect Storage creates a direct data path between local or remote storage, such as NVMe or NVMe over Fabrics (NVMe-oF), and GPU memory.
Dask
Expand data science pipelines to multiple nodes with NVIDIA RAPIDS on Dask.
Image and Video Libraries
GPU-accelerated libraries for image and video decoding, encoding, and processing that use CUDA and specialized hardware components of GPUs.
nvImageCodec
GPU-accelerated image codec library with a unified interface for high-throughput image encoding and decoding, built as an extensible framework for a wide array of codec plugins.
NVIDIA DALI
GPU-accelerated library for data loading and pre-processing to accelerate deep learning applications for image, video, and audio modalities.
CV-CUDA
Open source library for high-performance, GPU-accelerated pre- and post-processing in vision AI pipelines.
cuCIM
Open source, accelerated computer vision and image processing library for multidimensional images in biomedical, geospatial, material, healthcare, and remote sensing use cases.
NVIDIA Performance Primitives (NPP)
GPU-accelerated library of highly optimized primitives for CUDA-based 2D image and signal processing, including filtering, color conversion, and image manipulation.
NVIDIA Video Codec SDK
Hardware-accelerated video encode and decode on Windows and Linux.
NVIDIA Optical Flow SDK
Exposes the latest hardware capability of NVIDIA GPUs dedicated to computing the relative motion of pixels between images.
Communication Libraries
Performance-optimized multi-GPU and multi-node communication primitives.
NVSHMEM
Based on the OpenSHMEM one-sided communication model, provides a partitioned global address space across GPU memories.
NCCL
Open source library for fast multi-GPU, multi-node communication that maximizes bandwidth while maintaining low latency.
NIXL
Low-latency inference transfer library, moving KV cache and tensors between GPUs, memory tiers, and storage.
Partner Libraries
OpenCV
GPU-accelerated open-source library for computer vision, image processing, and machine learning, now supporting real-time operation.
FFmpeg
Open-source multimedia framework with a library of plug-ins for audio and video processing.
ArrayFire
GPU-accelerated open-source library for matrix, signal, and image processing.
MAGMA
GPU-accelerated linear algebra routines for heterogeneous architectures, by Magma.
IMSL Fortran Numerical Library
GPU-accelerated open-source Fortran library with functions for math, signal and image processing, and statistics, by RogueWave.
Gunrock
Library for graph-processing designed specifically for the GPU.
CHOLMOD
GPU-accelerated functions for sparse direct solvers, included in the SuiteSparse linear algebra package, authored by Prof.
Triton Ocean SDK
Real-time visual simulation of oceans, water bodies in games, simulation, and training applications, by Triton.
CUVIlib
Primitives for accelerating imaging applications in medical, industrial, and defense domains.
CuPy
Open source array library for GPU-accelerated computing with Python, providing a NumPy/SciPy-compatible interface.
Resources
Documentation
Training
Community
Get Started
Members of the NVIDIA Developer Program get early access to all CUDA library releases and the NVIDIA online bug reporting and feature request system.