Technical Resources for Accelerating Creative Applications
NVIDIA offers a range of SDKs and technologies that can be used to build and optimize AI applications for deployment on NVIDIA GPUs. After identifying which technologies are applicable for your use case there are still opportunities and best practices to optimize for GPUs.
Sign Up
Maximizing Productivity with NVIDIA AI Solutions
Video Applications
The Audio Effects SDK delivers multi-effect, low-latency audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.
The Video Effects SDK enables AI-based visual effects that run with standard webcam input and can be easily integrated into video conference pipelines. The underlying deep learning models are optimized with NVIDIA AI using NVIDIA TensorRT™ for high-performance inference, making it possible for developers to apply multiple effects in real-time applications.
The NVIDIA® Optical Flow SDK exposes the latest hardware capability of NVIDIA Turing, Ampere, and Ada architecture GPUs dedicated to computing the relative motion of pixels between images. The hardware uses sophisticated algorithms to yield highly accurate flow vectors, which are robust to frame-to-frame intensity variations and track the true object motion.
The NVIDIA Video Codec SDK is a comprehensive set of APIs including high-performance tools, samples and documentation for hardware accelerated video encode and decode on Windows and Linux.
Watch Video: How To Remove Background Noise
with NVIDIA Maxine’s Audio Effects SDK
Broadcast Applications
The Audio Effects SDK delivers multi-effect, low-latency audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.
NVIDIA DLSS (Deep Learning Super Sampling) is a neural graphics technology that multiplies performance using AI to create entirely new frames and display higher resolution through image reconstruction—all while delivering best-in-class image quality and responsiveness.
The NVIDIA Video Codec SDK is a comprehensive set of APIs including high-performance tools, samples and documentation for hardware accelerated video encode and decode on Windows and Linux.
Watch Webinar: NVIDIA DLSS and Enscape: Introducing the
Latest Technology in Real-Time Visualization
3D & Graphic Design Applications
The Augmented Reality SDK offers AI-powered, real-time 3D face tracking and body pose estimation based on a standard webcam feed. Developers can create unique AR effects such as overlaying 3D content on a face — driving 3D characters and virtual interactions in real time.
The NVIDIA Material Definition Language (MDL) SDK is a set of tools to enable quick integration of physically-based materials into rendering applications.
Iray is a high-performance, global illumination rendering technology that generates imagery by simulating the physical behavior of light interaction with surfaces and volumes. Images are progressively refined to provide full global illumination—including caustics, sun studies, and luminance distributions.
NVIDIA OptiX™ Ray Tracing Engine is an application framework for achieving optimal ray tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray tracing algorithms. Bring the power of NVIDIA GPUs to your ray tracing applications with programmable intersection, ray generation, and shading.
NVIDIA DLSS (Deep Learning Super Sampling) is a neural graphics technology that multiplies performance using AI to create entirely new frames and display higher resolution through image reconstruction—all while delivering best-in-class image quality and responsiveness.
Leveraging the power of ray tracing, the RTX Global Illumination SDK provides scalable solutions to compute multi-bounce indirect lighting without bake times, light leaks, or expensive per-frame costs. RTXGI is supported on any DXR-enabled GPU, and is an ideal starting point to bring the benefits of ray tracing to your existing tools, knowledge, and capabilities.
NVIDIA Kaolin library provides a PyTorch API for working with a variety of 3D representations. It includes a growing collection of GPU-optimized operations such as modular differentiable rendering, fast conversions between representations, data loading, camera classes, volumetric acceleration data structures, 3D checkpoints, and more.
Photography Applications
The NVIDIA Data Loading Library (DALI) is a portable, open source library for decoding and augmenting images, videos and speech to accelerate deep learning applications. DALI reduces latency and training time, mitigating bottlenecks, by overlapping training and pre-processing.
Our researchers developed state-of-the-art image reconstruction that fills in missing parts of an image with new pixels that are generated from the trained model, independent from what’s missing in the photo. Give it a shot with a landscape or portrait.
Watch Video: Fast Data Preprocessing (Image, Video, Audio and Signal)
with DALI, NPP and nvJPEG
Read Resources: Partial Convolution Layer for Padding and Image Inpainting
Audio Applications
NVIDIA Riva is a GPU-accelerated speech AI SDK for building and deploying fully customizable, real-time AI pipelines that deliver world-class accuracy in all clouds, on-premises, at the edge and on embedded devices.
NVIDIA NeMo is an open-source framework for developers to build and train state-of-the-art conversational AI models. With NeMo, you can build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants.
View Document: NeMo tutorials
Read Blog: An Easy Introduction to Speech AI
Best Practices
Profile Before Optimizing
Profiling allows you to pinpoint where optimization is critical and where small changes can have a big impact. Optimizing without first understanding where it is necessary may result in little to no performance gains.
NVIDIA Nsight™ Systems is a low-overhead performance analysis tool designed to provide insights developers need to optimize their software. Nsight Systems is simple to use and provides information ranging from high to low level. Creating an application trace takes only a few minutes and provides all the insights you need to determine the optimization that promises the best return on investment.
Profiling Best Practices:
- Profile in a clean environment; Close other apps that utilize resources and add noise to the application trace.
- Triage then optimize. What is the bottleneck? Inference, Pre-/Post-Processing, PCIe transfers?
- Reprofile after every change/optimization. Fixing one bottleneck might have unforeseeable side-effects on performance.
Use All Resources Efficiently
Your system can execute those four things at the same time, make use of that, and execute asynchronously:
- CPU processing
- GPU processing
- CPU-to-GPU copies via bidirectional PCIe bus
- GPU-to-CPU copies via bidirectional PCIe bus
Train With Mixed Precision
Tensor Cores offer HW-accelerated convolutions on Volta, Turing, Ampere, Ada architectures. Lower precision arithmetic leveraging TF32, FP16 or INT8 will boost performance.
- TF32 is a tradeoff between FP32 and FP16 in terms of performance and accuracy.
- FP16 accuracy sufficient for most use cases and enables tensor cores while reduces VRAM pressure.
- INT8 offers best performance but can lack accuracy and requires weight recalibration.
Latest Blogs
End-to-End AI for Workstation: An Introduction to Optimization
Check out our blog series on optimizing end-to-end AI for workstations that will cover several use cases that are specific to APIs, including NVIDIA TensorRT, NVIDIA cuDNN, and ONNX Runtime and Microsoft WinML.
End-to-End AI for Workstation: Transitioning AI Models with ONNX
Continue optimizing end-to-end AI for workstations by learning how to use ONNX to transition your AI models from research to production while avoiding common mistakes.
Deploying Diverse AI Model Categories from Public Model Zoo Using NVIDIA Triton Inference Server
This post gives you an overview of prevalent deep learning model categories and walks you through the end-to-end examples of deploying these models using NVIDIA Triton Inference Server.