SDK Glossary

NVIDIA pioneered accelerated computing by extending the most successful parallel processor in history, the GPU, to general-purpose computing. However, accelerated computing requires more than just powerful chips. We help developers achieve incredible speedups through full-stack invention, from the chips and systems to the algorithms and applications they run. These algorithms are optimized and packaged into software developer kits (SDKs), where they help millions of developers across industries solve complex problems.

Browse NVIDIA’s extensive suite of SDKs below:

Aerial cuBBThe CUDA Baseband (cuBB) SDK provides a GPU-accelerated 5G signal processing pipeline, including cuPHY for Layer 1 5G PHY. It delivers unprecedented throughput and efficiency by keeping all physical layer processing within the high-performance GPU memory.
Aerial cuVNFThe CUDA Virtual Network Functions (cuVNF) SDK provides optimized input/output and packet processing, exchanging packets directly between GPU memory and GPUDirect®-capable NVIDIA ConnectX®-6 DX network interface cards.
Aerial Developer KitNVIDIA Aerial is an Application framework for building high performance, software-defined, cloud-native 5G applications to address increasing consumer demand. Optimize your results with parallel processing on GPU for baseband signals and data flow. Designed to jumpstart performance evaluation and benchmarking for RAN development, the NVIDIA Aerial™ Developer Kit includes preconfigured software and test vectors to deliver an out-of-the-box guided experience.
AnselNVIDIA Ansel is a revolutionary way to capture in-game shots and share the moment. Compose your screenshots from any position, adjust them with post-process filters, capture HDR images in high-fidelity formats, and share them in 360 degrees using your mobile phone, PC, or VR headset.
Capture SDKNVIDIA Capture SDK (formerly GRID SDK) enables developers to easily and efficiently capture, and optionally encode, the display content.
Clara AGXThe NVIDIA Clara AGX™ developer kit delivers real-time AI and imaging for medical devices. By combining low-powered, NVIDIA Jetson AGX Xavier and RTX GPU with the NVIDIA Clara AGX SDK and the NVIDIA EGX stack, it’s easy to securely provision and remotely manage fleets of distributed medical instruments.
Clara DeployAn extensible reference development framework that facilitates turning AI models into AI-powered clinical workflows with built-in support for DICOM communication and the ability to interface with existing hospital infrastructures.
Clara Discovery
Clara Discovery is a collection of frameworks, applications, and AI models enabling GPU-accelerated computational drug discovery. Drug development is a cross-disciplinary endeavor. Clara Discovery can be applied across the drug discovery process and combines accelerated computing, AI and machine learning in genomics, proteomics, microscopy, virtual screening, computational chemistry, visualization, clinical imaging, and natural language processing.
Clara GuardianNVIDIA Clara™ Guardian is an application framework and partner ecosystem that simplifies the development and deployment of smart sensors with multimodal AI, anywhere in a healthcare facility. With a diverse set of pre-trained models, reference applications, and fleet management solutions, developers can build solutions faster—bringing AI to healthcare facilities and improving patient care.
Clara Holoscan
NVIDIA Clara™ Holoscan is a hybrid computing platform for medical devices that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications anywhere, from embedded to edge to cloud.
Clara Parabricks
NVIDIA Clara™ Parabricks is a computational framework supporting genomics applications from DNA to RNA. It employs NVIDIA’s CUDA, HPC, AI, and data analytics stacks to build GPU accelerated libraries, pipelines, and reference application workflows for primary, secondary, and tertiary analysis. Clara Parabricks is a complete portfolio of off-the-shelf solutions coupled with a toolkit to support new application development to address the needs of genomic labs.
Clara Train
The Clara Train reference applications is a domain optimized developer application framework that includes APIs for AI-Assisted Annotation, making any medical viewer AI capable and a TensorFlow based training framework with pre-trained models to kick start AI development with techniques like transfer learning, federated learning and AutoML.
Clara Viz
NVIDIA Clara Viz is a high-performance SDK that enables advanced visualization of medical imaging datasets (CT, MR, US) using CUDA based ray marching technology, to generate high quality interactive images.
The NVIDIA CloudXR SDK provides a way to stream graphics-intensive augmented reality (AR), virtual reality (VR) or mixed reality (MR), content often called XR, over a radio signal (5G or Wifi) or ethernet. The SDK enables immediate streaming of OpenVR applications to a number of 5G-connected Android devices providing the benefits of graphics-intensive applications on relatively low-powered graphics hardware.
Collective primitives and utilities. CUB is specific to CUDA C++ and its interfaces explicitly accommodate CUDA-specific features.
The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs. The cuBLAS library contains extensions for batched operations, execution across multiple GPUs, and mixed and low precision execution. Using cuBLAS, applications automatically benefit from regular performance improvements and new GPU architectures. The cuBLAS library is included in both the NVIDIA HPC SDK and the CUDA Toolkit.
An extensible toolkit designed to provide GPU accelerated I/O, computer vision & image processing primitives for N-Dimensional images with a focus on biomedical imaging.
CUDA GraalVMAn open-source prototype called grCUDA that leverages Oracle's GraalVM and exposes GPUs in polyglot environments. While GraalVM can be regarded as the "one VM to rule them all," grCUDA is the "one GPU binding to rule them all."
CUDA MATH APIThe CUDA Math library is an industry proven, highly accurate collection of standard mathematical functions.
CUDA on WSLNVIDIA CUDA on WSL brings NVIDIA CUDA advanced AI and data science developer tools together with the ubiquitous Microsoft Windows platform to deliver advanced machine learning capabilities across numerous industry segments and application domains.
CUDA PythonCUDA® Python provides Cython/Python wrappers for CUDA driver and runtime APIs; and is installable today by using PIP and Conda. Python developers will be able to leverage massively parallel GPU computing to achieve faster results and accuracy.
CUDA ToolkitCUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.
CUDA-X AINVIDIA CUDA-X AI is a complete deep learning software stack for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision. CUDA-X AI libraries deliver world leading performance for both training and inference across industry benchmarks such as MLPerf.
CUDA-X HPCCUDA-X HPC is a collection of libraries, tools, compilers and APIs that helps developers solve the world’s most challenging problems. CUDA-X HPC is built on top of CUDA, NVIDIA’s parallel computing platform and programming model. CUDA-X HPC includes highly tuned kernels essential for high-performance computing (HPC).
NVIDIA CUDA® Deep Neural Network (cuDNN) is a GPU-accelerated library of primitives for deep neural networks with highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
The cuFFT Library provides GPU-accelerated FFT implementations that perform up to 10X faster than CPU-only alternatives. cuFFT is used for building commercial and research applications across disciplines such as deep learning, computer vision, computational physics, molecular dynamics, quantum chemistry, and seismic and medical imaging. Using cuFFT, applications automatically benefit from regular performance improvements and new GPU architectures. The cuFFT library is included in both the NVIDIA HPC SDK and the CUDA Toolkit.
Included with RAPIDS, cuGraph focuses on the graph analysis part of data science. cuGraph supports Minimum Spanning Tree (MST)/Maximum Spanning Forest (MSF) and Single Layer Hierarchical Clustering (SLHC) and its application in finance; Egonet extraction and how that ties into graph neural networks; the traveling salesman; the push to scale to multiple GPUs and increased compatibility with external frameworks; and more.
NVIDIA cuNumeric aspires to be a drop-in replacement library for NumPy, bringing distributed and accelerated computing on the NVIDIA platform to the Python community. Library used for performing array-based numerical computations. cuNumeric achieves this by translating the NumPy application interface into the Legion programming model and leveraging the performance and scalability of the Legion runtime
NVIDIA cuOpt™ is an AI logistics software API that enables near real-time routing optimizations. ReOpt empowers logistics and operational research developers to leverage larger data sets and faster processing, delivering new capabilities like dynamic-rerouting, simulations, and sub-second solver response time for last-mile delivery, supply chain, warehouse picking, and food delivery. (Formerly called ReOpt)
The NVIDIA® CUDA Profiling Tools Interface (CUPTI) is a dynamic library that enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides a set of APIs targeted at ISVs creating profilers and other performance optimization tools
cuQuantum is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Developers can use cuQuantum to speed up quantum circuit simulations based on state vector and tensor network methods by orders of magnitude.
The NVIDIA CUDA Random Number Generation library (cuRAND) delivers high performance GPU-accelerated random number generation (RNG). The cuRAND library delivers high quality random numbers 8x faster using hundreds of processor cores available in NVIDIA GPUs. The cuRAND library is included in both the NVIDIA HPC SDK and the CUDA Toolkit.
The NVIDIA cuSOLVER library provides a collection of dense and sparse direct linear solvers and Eigen solvers which deliver significant acceleration for Computer Vision, CFD, Computational Chemistry, and Linear Optimization applications. The cuSOLVER library is included in both the NVIDIA HPC SDK and the CUDA Toolkit.
cuSPARCELtcuSPARSELt, a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix. The cuSPARSELt library lets you use NVIDIA third-generation Tensor Cores Sparse Matrix Multiply-Accumulate (SpMMA) operation without the complexity of low-level programming. The library also provides helper functions for pruning and compressing matrices.
The cuSPARSE library provides GPU-accelerated basic linear algebra subroutines for sparse matrices that perform significantly faster than CPU-only alternatives. It provides functionality that can be used to build GPU accelerated solvers. cuSPARSE is widely used by engineers and scientists working on applications such as machine learning, computational fluid dynamics, seismic exploration and computational sciences. Using cuSPARSE, applications automatically benefit from regular performance improvements and new GPU architectures. The cuSPARSE library is included in both the NVIDIA HPC SDK and the CUDA Toolkit.
The cuTENSOR Library is a GPU-accelerated tensor linear algebra library providing tensor contraction, reduction and elementwise operations. cuTENSOR is used to accelerate applications in the areas of deep learning training and inference, computer vision, quantum chemistry and computational physics. Using cuTENSOR, applications automatically benefit from regular performance improvements and new GPU architectures.
CUTLASS (CUDA Templates for Linear Algebra Subroutines), a collection of CUDA C++ templates and abstractions for implementing high-performance GEMM computations at all levels and scales within CUDA kernels.
DALINVIDIA Data Loading Library (DALI) is a portable, open source library for decoding and augmenting images,videos and speech to accelerate deep learning applications.
DCGMNVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA datacenter GPUs in cluster environments. It includes active health monitoring, comprehensive diagnostics, system alerts and governance policies including power and clock management. It can be used standalone by infrastructure teams and easily integrates into cluster management tools, resource scheduling and monitoring products from NVIDIA partners.
DeepStreamNVIDIA’s DeepStream SDK delivers a complete streaming analytics toolkit for AI-based multi-sensor processing, video, audio and image understanding
DGL ContainerDeep Graph Library (DGL) is a framework-neutral, easy-to-use, and scalable Python library used for implementing and training Graph Neural Networks (GNN). Being framework-neutral, DGL is easily integrated into an existing PyTorch, TensorFlow, or an Apache MXNet workflow.
DL ProfGPU utilization is a great starting point for profiling and optimization. You can do more analysis of modeling in detail by employing tools like DLProf and PyProf. You can also take advantage of user interfaces to visually inspect your code. Deep Learning Profiler (DLProf) provides support for TensorBoard so that you can visually inspect your models.
NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games.
DOCA is a software framework that enables developers to rapidly create applications and services on top of NVIDIA BlueField® data processing units (DPUs), leveraging industry-standard APIs.
DRIVE ConstellationNVIDIA DRIVE Constellation™ is a dedicated data center platform for AV hardware-in-the-loop (HIL) simulation at scale. It runs NVIDIA DRIVE Sim™.
DRIVE OSThe foundation of the DRIVE Software stack, DRIVE OS is the first safe operating system for in-vehicle accelerated computing. It includes NvMedia for sensor input processing, NVIDIA CUDA® libraries for efficient parallel computing implementations, NVIDIA TensorRT™ for real-time AI inference, and other developer tools and modules to access hardware engines.
DRIVE SimNVIDIA DRIVE Sim™ is the core simulation engine and tests on the same AV hardware used in the vehicle to support bit- and timing-accurate AV validation.
DRIVE Software
The open NVIDIA DRIVE® SDK gives developers all the building blocks and algorithmic stacks needed for autonomous driving. It empowers developers to efficiently build and deploy a variety of state-of-the-art AV applications more efficiently, including perception, localization and mapping, planning and control, driver monitoring, and natural language processing.
DriveWorksNVIDIA DriveWorks provides middleware functions on top of DRIVE OS that are fundamental to autonomous vehicle development. These consist of the sensor abstraction layer (SAL) and sensor plug-ins, data recorder, vehicle I/O support, and a deep neural network (DNN) framework. It’s modular, open, and designed to be compliant with automotive industry software standards.
DriveWorks SDKThe NVIDIA® DriveWorks SDK is the foundation for all autonomous vehicle (AV) software development. It provides an extensive set of fundamental capabilities, including processing modules, tools and frameworks that are required for advanced AV development.
EGX Software StackFrom the enterprise to the edge, the NVIDIA EGX™ stack delivers a cloud-native platform for GPU-accelerated machine learning, deep learning, and high-performance computing (HPC). Use the EGX stack to quickly and painlessly run GPU-optimized NGC™ containers on NVIDIA-Certified servers.
FLARE (Federated Learning Active Runtime Environment) is Nvidia’s open source extensible SDK that allows researchers and data scientists to adapt existing ML/DL workflow to a privacy preserving federated paradigm. FLARE makes it possible to build robust, generalizable AI models without sharing data.
FleXFleX is a particle based simulation technique for real-time visual effects.Traditionally, visual effects are made using a combination of elements created using specialized solvers for rigid bodies, fluids, clothing, etc. Because FleX uses a unified particle representation for all object types, it enables new effects where different simulated substances can interact with each other seamlessly. Such unified physics solvers are a staple of the offline computer graphics world, where tools such as Autodesk Maya's nCloth, and Softimage's Lagoa are widely used. The goal for FleX is to use the power of GPUs to bring the capabilities of these offline applications to real-time computer graphics.
FMEFeature Map Explorer (FME) enables visualization of 4-dimensional image-based feature map data using a range of views, from low-level channel visualizations to detailed numerical information about each channel slice.
GeForce Now SDK
The GeForce NOW SDK (GFN SDK) is a means for game developers and publishers to directly integrate with GeForce NOW, NVIDIA's Cloud Gaming Service. The GFN SDK is ever-evolving to provide easy integration of GeForce NOW features into publisher applications and games, as well as more efficient way to integrate games into the GeForce NOW ecosystem.
GPUDirect for VideoNVIDIA GPUDirect® is a family of technologies, part of Magnum IO, that enhances data movement and access for NVIDIA data center GPUs.
GPUDirect StorageMagnum IO GPUDirect® Storage creates a direct data path between local or remote storage, such as NVMe or NVMe over Fabrics (NVMe-oF), and GPU memory. By enabling a direct-memory access (DMA) engine near the network adapter or storage, it moves data into or out of GPU memory—without burdening the CPU.
GVDB VoxelsNVIDIA GVDB Voxels is a new framework for simulation, compute and rendering of sparse voxels on the GPU.
HighlightsNVIDIA Highlights enables automatic video capture of key moments, clutch kills, and match-winning plays, ensuring gamers’ best gaming moments are always saved. Once a Highlight is captured, gamers can simply share it directly to Facebook, YouTube, or Weibo right from GeForce Experience’s in-game overlay. Additionally, they can also clip their favorite 15 seconds and share as an animated GIF - all without leaving the game!
A Comprehensive Suite of Compilers, Libraries and Tools for HPC.
IndeXNVIDIA IndeX is a 3D volumetric interactive visualization SDK that allows scientists and researchers to visualize and interact with massive data sets, make real-time modifications, and navigate to the most pertinent parts of the data, all in real-time, to gather better insights faster. IndeX leverages GPU clusters for scalable, real-time, visualization and computing of multi-valued volumetric data together with embedded geometry data.
IndeX - Amazon Web ServicesAvailable as a custom Amazon Machine Image (AMI) from the AWS Marketplace, the NVIDIA IndeX SDK enables users to modify massive data sets and navigate to the most pertinent parts of the data to gather better insights — all in real time.
IndeX - Google Cloud MarketplaceNVIDIA IndeX is now available on the Google Cloud Marketplace. With the IndeX SDK, scientists and researchers can visualize, interact with, and modify massive data sets, and also navigate to the most pertinent parts of the data — all in real time.
Isaac ROS GEMs
NVIDIA® Isaac ROS GEMs are hardware accelerated packages that make it easier for ROS developers to build high-performance solutions on NVIDIA hardware.
Isaac SDK
NVIDIA Isaac SDK™ is a toolkit that includes building blocks and tools that accelerate robot developments that require the increased perception and navigation features enabled by AI.
Isaac Sim
NVIDIA Isaac Sim, powered by Omniverse, is a scalable robotics simulation application and synthetic data generation tool that powers photorealistic, physically-accurate virtual environments to develop, test, and manage AI-based robots.
NVIDIA JetPack SDK is the most comprehensive solution for building end-to-end accelerated AI applications. All Jetson modules and developer kits are supported by JetPack SDK.
libcu++, the NVIDIA C++ Standard Library, provides a C++ Standard Library for your entire system which can be used in and between CPU and GPU code.
Magnum IO SDKNVIDIA MAGNUM IO™ software development kit (SDK) enables developers to remove input/output (IO) bottlenecks in AI, high performance computing (HPC), data science, and visualization applications, reducing the end-to-end time of their workflows. Magnum IO covers all aspects of data movement between CPUs, GPUsns, DPUs, and storage subsystems in virtualized, containerized, and bare-metal environments.
Math LibrariesGPU-accelerated Math Libraries lay the foundation for compute-intensive applications in areas such as molecular dynamics, computational fluid dynamics, computational chemistry, medical imaging, and seismic exploration.
MDL SDKThe NVIDIA Material Definition Language (MDL) is a programming language for defining physically based materials for rendering. The MDL SDK is a set of tools to integrate MDL support into rendering applications. It contains components for loading, inspecting, editing of material definitions as well as compiling MDL functions to GLSL, HLSL, Native x86, PTX and LLVM-IR. With the NVIDIA MDL SDK, any physically based renderer can easily add support for MDL and join the MDL eco-system.
Project Maxine is a reference application for Omniverse Avatar, a technology platform for generating interactive AI avatars.
Maxine Augmented Reality (AR)The AR Effects SDK enables developers to create fun and engaging AR effects with real-time 3D tracking of a person’s face and body using a standard web camera.
Maxine Audio EffectsThe Audio Effects SDK enables developers to utilize AI in real-time to remove distracting background noise, isolating human speech in audio from incoming and/or outgoing audio feeds.
Maxine Video EffectsThe Video Effects SDK enables developers to utilize AI-based visual features in real-time, and transform noisy, low-resolution video streams into a pleasant end-user experience with a virtual background of their choice and higher resolution images with no video noise and reduced artifacts.
NVIDIA Merlin™ is an open-source framework for building large-scale deep learning recommender systems -- providing an end-to-end referential architecture.
Merlin HugeCTR
Merlin HugeCTR (Huge Click-Through-Rate) is a deep neural network (DNN) training framework designed for recommender systems. It provides distributed training with model-parallel embedding tables, an embeddings cache, and data-parallel neural networks across multiple GPUs and nodes for maximum performance.
Merlin NVTabular
Merlin NVTabular is a feature engineering and preprocessing library designed to effectively manipulate terabytes of recommender system datasets and significantly reduce data preparation time.
Mesh ShadingThe Turing architecture introduced a new programmable geometric shading pipeline through the use of mesh shaders. The new shaders bring the compute programming model to the graphics pipeline as threads are used cooperatively to generate compact meshes (meshlets) directly on the chip for consumption by the rasterizer.
MetropolisNVIDIA Metropolis is an application framework, set of developer tools, and partner ecosystem that brings visual data and AI together to improve operational efficiency and safety across a broad range of industries.
Modulus is a Framework for Developing Physics Machine Learning Neural Network Models, ideal for Digital Twins. (Previously referred to as SimNet)
The MONAI framework is the open-source foundation being created by Project MONAI. MONAI is a freely available, community-supported, PyTorch-based framework for deep learning in healthcare imaging. It provides domain-optimized foundational capabilities for developing healthcare imaging training workflows in a native PyTorch paradigm.
MONAI Deploy App
MONAI Deploy App SDK offers a framework and associated tools to design, develop and verify AI-driven applications in the healthcare imaging domain.
MONAI Label is an intelligent open source image labeling and learning tool. A Python library that enables AI based labelling of medical imaging data and active learning.
MONAI Stream
Python library that enables researchers to build and test streaming AI applications for medical use cases like Ultrasound.
NVIDIA Morpheus is a cybersecurity framework which uses AI to identify, capture, and act on threats and anomalies that were previously impossible to identify.
MosaicWhether you want to see your work across multiple displays or project your ideas in 4K, you can with NVIDIA Mosaic™ multi-display technology. With NVIDIA Mosaic, you can easily span any application across up to 16 high-resolution panels or projectors from a single system, conveniently treating the multiple displays as a single desktop, without application software changes or visual artifacts.
The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.
NVIDIA NeMo is a framework for developers to build and train state-of-the-art conversational AI models.
NeMo Megatron is the fastest framework for training large language models and can efficiently train models with billions and trillions of parameters. It provides data curation as well as parallelism techniques (Data, tensor and pipeline).
NGC AI Models
State-of-the-art AI models from NVIDIA NGC help data scientists and developers quickly build custom models or use them as is for inference.
NGC Pix2PixHDPix2PixHD is a PyTorch implementation of a deep learning-based method for high-resolution (e.g. 2048×1024) photorealistic image-to-image translation.
NGC TensorFlowNVIDIA works with Google and the community to accelerate TensorFlow on NVIDIA GPUs and makes them available within open source as well as ready-to-run containers. NVIDIA also has an open source project to accelerate TensorFlow 1.x on new NVIDIA GPUs.
NGX SDKNVIDIA NGX is a new deep learning powered technology stack bringing AI-based features that accelerate and enhance graphics, photos imaging and video processing directly into applications. The NGX SDK makes it easy for developers to integrate AI features into their application with pre-trained networks.
NISNVIDIA Image Scaling is a driver-based spatial upscaler and sharpener for GeForce GPUs for all games.
NVIDIA Performance Premitives - Provides GPU-accelerated image, video, and signal processing functions
The NVIDIA Real-Time Denoisers (NRD) are a spatio-temporal API agnostic denoising library that’s designed to work with low ray per pixel signals. It uses input signals and environmental conditions to deliver results comparable to ground truth images.
Nsight AftermathNsight™ Aftermath SDK is a simple library you integrate into your D3D12 or Vulkan game’s crash reporter to generate GPU "mini-dumps" when a TDR or exception occurs.
Nsight Compute
DevTool for CUDA Kernel profiling & debugging
Nsight Deep Learning Designer
An integrated development environment for developers who wish to incorporate high-performance DL-based features into host applications either on the desktop or on the edge.
Nsight Graphics
NVIDIA® Nsight™ Graphics is a standalone developer tool that enables you to debug, profile, and export frames built with Direct3D (11, 12, DXR), Vulkan (1.2, NV Vulkan Ray Tracing Extension), OpenGL, OpenVR, and the Oculus SDK.
Nsight Perf SDK
NVIDIA® Nsight Perf SDK is a graphics profiling toolbox for DirectX, Vulkan, and OpenGL, enabling you to collect GPU performance metrics directly from your application.
Nsight Systems
NVIDIA® Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, help you identify the largest opportunities to optimize, and tune to scale efficiently across any quantity or size of CPUs and GPUs; from large server to our smallest SoC.
Nsight Visual Studio Code EditionNVIDIA Nsight™ Visual Studio Code Edition (VSCE) is an application development environment for heterogeneous platforms that brings CUDA® development for GPUs into Microsoft Visual Studio Code. NVIDIA Nsight™ VSCE enables you to build and debug GPU kernels and native CPU code as well as inspect the state of the GPU and memory.
NSight Visual Studio EditionNVIDIA® Nsight™ Visual Studio Edition is an application development environment for heterogeneous platforms which brings GPU computing into Microsoft Visual Studio. NVIDIA Nsight™ VSE allows you to build and debug integrated GPU kernels and native CPU code as well as inspect the state of the GPU and memory.
NVAPINVAPI is NVIDIA's core software development kit that allows direct access to NVIDIA GPUs and drivers on all windows platforms. NVAPI provides support for categories of operations that range beyond the scope of those found in familiar graphics APIs such as DirectX and OpenGL.
nvCOMP is a high performance GPU enabled data compression library. Includes both open-source and non-OS components. The nvCOMP library provides fast lossless data compression and decompression using a GPU. It features generic compression interfaces to enable developers to use high-performance GPU compressors in their applications.
NVIDIA PyTorchNVIDIA works with Facebook and the community to accelerate PyTorch on NVIDIA GPUs in the main PyTorch branch, as well as, with ready-to-run containers in NGC.
The nvJPEG library is a high-performance GPU-accelerated library for decoding, encoding and transcoding JPEG format images.
nvJPEG2KThe nvJPEG2000 library is for decoding JPEG 2000 format images. Applications that rely on nvJPEG or nvJPEG2000 for decoding deliver higher throughput and lower latency compared to CPU-only decoding.
NVSHMEMOpenSHMEM standard for GPU memory, with extensions for improved performance on GPUs. Parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters.
NVIDIA Omniverse™ is a scalable, multi-GPU real-time reference development platform for 3D simulation and design collaboration, and based on Pixar's Universal Scene Description and NVIDIA RTX™ technology. NVIDIA Omniverse is built from the ground up to be easily extensible and customizable with a modular development framework. While end-users and content creators leverage the Omniverse platform to connect and accelerate their 3D workflows, developers can plug into the platform layer of the Omniverse stack to easily build new tools and services.
Omniverse Kaolin AppNVIDIA Omniverse Kaolin App is an interactive application for 3D deep learning researchers that allows inspecting 3D datasets, interactive visualization of 3D outputs of a model during training, and synthetic dataset rendering. Built on Omniverse Kit, the research application benefits from high-fidelity RTX rendering and will gain new functionality periodically from new extensions.
OpenACC is a directive-based programming model designed to provide a simple yet powerful approach to accelerators without significant programming effort. With OpenACC, a single version of the source code will deliver performance portability across the platforms. OpenACC offers scientists and researchers a quick path to accelerated computing with less programming effort. By inserting compiler “hints” or directives into your C11, C++17 or Fortran 2003 code, with the NVIDIA OpenACC compiler you can offload and run your code on the GPU and CPU.
Optical Flow
Optical Flow SDK exposes the latest hardware capability of Turing and Ampere GPUs dedicated to computing the relative motion of pixels between images.
An application framework for achieving optimal ray tracing performance on the GPU.
OptiX DenoiserIt uses GPU-accelerated artificial intelligence to dramatically reduce the time to render a high fidelity image that is visually noiseless.
PhysXNVIDIA PhysX is a scalable multi-platform physics simulation solution supporting a wide range of devices, from smartphones to high-end multicore CPUs and GPUs. The powerful SDK brings high-performance and precision accuracy to industrial simulation use cases from traditional VFX and game development workflows, to high-fidelity robotics, medical simulation, and scientific visualization applications.
RAPIDSRAPIDS is a suite of open-source software libraries and APIs for executing data science pipelines entirely on GPUs—and can reduce training times from days to minutes. Built on NVIDIA® CUDA-X AI™, RAPIDS unites years of development in graphics, machine learning, deep learning, high-performance computing (HPC), and more.
NVIDIA Reflex SDK allows game developers to implement a low latency mode that aligns game engine work to complete just-in-time for rendering, eliminating the GPU render queue and reducing CPU back pressure in GPU-bound scenarios.
Riva is a GPU-accelerated SDK for developing real-time Speech AI applications.
Unique IP-based solution that boosts video and data streaming performance. Rivermax together with NVIDIA GPU accelerated computing technologies unlocks innovation for a wide range of applications in Media and Entertainment (M&E), Broadcast, Healthcare, Smart Cities and more.
RTX DIImagine adding millions of dynamic lights to your game environments without worrying about performance or resource constraints. RTXDI makes this possible while rendering in real time. Geometry of any shape can now emit light and cast appropriate shadows: Tiny LEDs. Times Square billboards. Even exploding fireballs. RTXDI easily incorporates lighting from user-generated models. And all of these lights can move freely and dynamically.
RTX GILeveraging the power of ray tracing, the RTX Global Illumination SDK provides scalable solutions to compute multi-bounce indirect lighting without bake times, light leaks, or expensive per-frame costs. RTXGI is supported on any DXR-enabled GPU, and is an ideal starting point to bring the benefits of ray tracing to your existing tools, knowledge, and capabilities.
RTXGI UE4 PluginLeveraging the power of ray tracing, the RTX Global Illumination SDK provides scalable solutions to compute multi-bounce indirect lighting without bake times, light leaks, or expensive per-frame costs.
RTX UE4 BranchOur branch of Unreal Engine 4 with RTX features integrated. NvRTX is a custom UE4 branch for NVIDIA technologies on GitHub. Having custom UE4 branches on GitHub shortens the development cycle, and helps make games look more stunning.
RTXMURTXMU combines both compaction and suballocation techniques to optimize and reduce memory consumption of acceleration structures for any DXR or Vulkan Ray Tracing application
SDK ManagerNVIDIA SDK Manager provides an end-to-end development environment setup solution for NVIDIA’s DRIVE, Jetson, Clara Holoscan, Rivermax, DOCA and Ethernet Switch SDKs for both host and target devices.
Spark XGBoostGPU-accelerated enhancements to gradient boosting library XGBoost to provide fast and accurate ways to solve large-scale AI and data science problems.
Studio SDKNVIDIA Studio Stack is a set of software that provides digital content creators with the best performance and reliability when working with creative apps. It includes NVIDIA Studio SDKs and APIs for app developers and NVIDIA Studio Drivers for creators.
SwitchIBNVIDIA Quantum InfiniBand switches deliver a complete switch system and fabric management portfolio for connecting cloud-native supercomputing at any scale.
TAO Toolkit
NVIDIA Train, Adapt, and Optimize (TAO) Toolkit gives you a faster, easier way to accelerate training and quickly create highly accurate and performant, domain-specific AI models. (Formerly Transfer Learning Toolkit/TLT)
TensorRTNVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications.
Torch-TensorRT is an integration for PyTorch that leverages inference optimizations of TensorRT on NVIDIA GPUs. With just one line of code, it provides a simple API that gives up to 6x performance speedup.
TensorFlow-TensorRTTensorFlow-TensorRT (TF-TRT) is an integration of TensorFlow and TensorRT that leverages inference optimization on NVIDIA GPUs within the TensorFlow ecosystem with just one line of code.
TensorRT - MXNetTensorRT backend for MXNet allows its users to accelerate inference in MXNet with all the graph optimizations supported by TensorRT.
TensorRT - ONNX RuntimeTensorRT Integration with ONNX Runtime is one of its execution providers, that accelerate inference performance on GPUs with TensorRT inferencing engine.
Texture Tools 3Create block-compressed textures and write custom asset pipelines using NVTT 3, an SDK for CUDA-accelerated texture compression and image processing.
Texture Tools ExporterNew version of the Photoshop Texture Plugin; allows creators to import and export GPU-compressed texture formats such as DDS and KTX, and to apply image processing effects on the GPU. Uses NVTT (NVIDIA Texture Tools, proprietary version - this is the separate Exporter) as the base library. Includes a command-line interface for scripting and use in developer toolchains.
Thrust is a powerful library of parallel algorithms and data structures. Thrust provides a flexible, high-level interface for GPU programming that greatly enhances developer productivity. Using Thrust, C++ developers can write just a few lines of code to perform GPU-accelerated sort, scan, transform, and reduction operations orders of magnitude faster than the latest multi-core CPUs.
Triton Inference Server
NVIDIA Triton™ Inference Server delivers fast and scalable AI in production. Triton Inference Server streamlines AI inference by enabling teams to deploy, run and scale trained AI models from any framework on any GPU- or CPU-based infrastructure.
Unified Compute FrameworkUCF is a fully accelerated framework for developing real-time edge AI applications
Video Codec SDKA comprehensive set of API including high-performance tools, samples and documentation for hardware accelerated video encode and decode on Windows and Linux.
vMaterialsNVIDIA vMaterials are a curated collection of MDL materials and lights representing common real world materials used in design and AEC workflows. Integrating the Iray or MDL SDK quickly brings a library of hundreds of ready to use materials to your application without writing shaders.
VRWorks Graphics
VRWorks™ is a comprehensive suite of APIs, libraries, and engines that enable application and headset developers to create amazing virtual reality experiences. VRWorks enables a new level of presence by bringing physically realistic visuals, sound, touch interactions, and simulated environments to virtual reality.
Warp & BlendWarp and Blend are interfaces exposed in NVAPI for warping (image geometry corrections) and blending (intensity and black level adjustment) a single display output or multiple display outputs.
WaveWorksNVIDIA WaveWorks enables developers to deliver a cinematic-quality ocean simulation for interactive applications. The simulation runs in the frequency domain using spectral wave model for wind waves and displacements plus velocity potentials for interactive waves. A set of inverse FFT steps then transforms to the spatial domain ready for rendering. The NVIDIA WaveWorks simulation is initialized and controlled by a simple C API and the results are accessed for rendering as native graphics API objects. Parameterization is via intuitive real-world variables, such as wind speed and direction. These parameters can be used to tune the look of the sea surface for a wide variety of conditions - from gentle ripples to a heavy storm-tossed ocean based on the Beaufort scale.
XLIO is a user-space software library that exposes standard socket APIs with kernel-bypass architecture, enabling a hardware-based direct copy between an application’s user-space memory and the network interface XLIO boosts the performance of TCP/IP applications like NGINX, CDN and storage solutions such as Non-Volatile Memory Express (NVME) over TCP