Getting Started With NVIDIA AI for Your Applications

NVIDIA offers a range of AI tools, software development kits (SDKs), and technologies that can be used to optimize and enhance applications for deployment on NVIDIA GPUs. For additional support, reach out with questions on the Developer Forums or the NVIDIA Developer Discord.

Post in Developer Forums Join Our Discord Channel

Profile and Optimize Your Pipeline

Profile Before Optimizing

Profiling allows you to pinpoint where optimization is critical and where small changes can have a big impact. Optimizing without first understanding where it is necessary may result in little to no performance gains.

NVIDIA Nsight™ Systems is a system-wide performance analysis tool that is simple to use and allows you to visualize CPU-GPU interaction, track GPU activity, and trace GPU workloads. Creating an application trace takes only a few minutes and provides all the insights you need to determine the optimization that promises the best return on investment.

DX12 API calls as they happen chronologically in the timeline alongside render thread.s

Best Practices for Profiling and Optimizing

Profile in a clean environment. Close other apps that utilize resources and add noise to the application trace.
Triage then optimize. What is the bottleneck? Inference, pre- or post-processing, PCIe transfers?
Reprofile after every change or optimization. Fixing one bottleneck might have unforeseeable side-effects on performance.

Download Our Best Practices Guide

Accelerate Your AI Pipeline

Choosing a Machine Learning Framework

Several factors come into play when selecting the optimal machine learning framework for deploying an AI model. Given the effort required to switch between frameworks, it’s important to ensure that the initial selection is the most appropriate for your needs. NVIDIA fully supports and recommends TensorRT and WinML for local deployment.

TensorRT is optimized for highest-performance inferencing on NVIDIA GPUs. It runs only on NVIDIA GPUs, while WinML can work with a variety of GPU hardware.

	WinML	TensorRT
Direct deployment path from most frameworks via ONNX	✓	✓
OS Support	Windows	Windows & Linux
Hardware Support	Any GPU	NVIDIA GPUs
Performance	Fast	Fastest

Resources for WinML

Beginner

Blog: Using Windows ML, ONNX, and NVIDIA Tensor Cores

Blogs: End-to-End AI

Documentation: DirectML Execution Provider

Video: Workstation Inference With TensorRT, cuDNN, and WinML

Intermediate

Github page: Samples and Tools for WinML

Video: Workstation Inference With TensorRT, cuDNN, and WinML

Video: End-to-End AI With ONNX and DirectX 12 on Workstation

Video: Connect With the Experts: End-to-End AI Deployment for Workstation Applications

Resources for TensorRT

Beginner

Blog: Optimizing and Serving Models With NVIDIA TensorRT and NVIDIA Triton

Blogs: End-to-End AI

Video: Getting Started With NVIDIA TensorRT

Video: Workstation Inference With TensorRT, cuDNN, and WinML

Intermediate

Blog: Speeding Up Deep Learning Inference Using TensorRT

Blog: Estimating Depth With ONNX Models and Custom Layers Using NVIDIA TensorRT

Blog: Tips for Optimizing GPU Performance Using Tensor Cores

Resources for Generative AI

Accelerate With DirectML

Use DirectML to accelerate generative AI applications. The benefit of DirectML is that the same optimization will run on any hardware.

How to Optimize Models like Stable Diffusion With Olive

Blog: DirectML for Stable Diffusion

Accelerate With TensorRT

Leverage optimizations in TensorRT 8.6 to accelerate generative AI models, such as Stable Diffusion, Llama2, Mistral-7B, and NVGPT-8B. The benefit of TensorRT is getting the best performance out of the GPU, seamlessly be it on NVIDIAs datacenter systems, or locally on native Windows with NVIDIA RTX Systems.

How to Optimize Models like Stable Diffusion With TensorRT

Example TRT Pipeline for Stable Diffusion

Demo application that showcases the acceleration of Stable Diffusion pipeline using TensorRT

TensorRT Extension for Stable Diffusion

New Stable Diffusion Models Accelerated with NVIDIA TensorRT

Blog: Unlock Faster Image Generation in Stable Diffusion Web UI with NVIDIA TensorRT

TensorRT Toolbox for Large Language Models

RAG on Windows using TensorRT-LLM and LlamaIndex

TensorRT-LLM for Windows

NVIDIA AI SDKs

SDKs provided by NVIDIA enable developers to seamlessly incorporate cutting-edge AI functionalities into their innovative applications, expanding the scope of their creativity and enhancing the overall user experience.

Video and Broadcast

Audio Effects SDK

The Audio Effects SDK delivers multi-effect, low-latency audio-quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.

Learn More

Augmented Reality SDK

The Augmented Reality SDK offers AI-powered, real-time 3D face tracking and body pose estimation based on a standard webcam feed. Developers can create unique AR effects such as overlaying 3D content on a face—driving 3D characters and virtual interactions in real time.

Learn More

Video Effects SDK

The Video Effects SDK enables AI-based visual effects that run with standard webcam input and can be easily integrated into video conference and broadcast pipelines. The underlying deep learning models are optimized with NVIDIA AI using TensorRT for high-performance inference, enabling developers to apply multiple effects in real-time applications.

Learn More

3D and Graphic Design

Audio2Face SDK

NVIDIA Omniverse™ Audio2Face beta is a reference application that simplifies animation of a 3D character to match any voice-over track, whether a user is animating characters for a game, film, real-time digital assistants, or just for fun.

Learn More

NVIDIA DLSS

NVIDIA DLSS is a neural graphics technology that multiplies performance using AI to create entirely new frames and display higher resolution through image reconstruction—all while delivering best-in-class image quality and responsiveness.

Learn More

OptiX Ray Tracing Engine

NVIDIA OptiX™ Ray Tracing Engine is an application framework for achieving optimal ray-tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray-tracing algorithms, including an advanced AI denoiser. Bring the power of NVIDIA GPUs to ray-tracing applications with programmable intersection, ray generation, and shading.

Learn More

Photography

Data Loading Library

The NVIDIA Data Loading Library (DALI) is a portable, open-source library for decoding and augmenting images, videos, and speech to accelerate deep learning applications. DALI reduces latency and training time, and mitigates bottlenecks by overlapping training and preprocessing.

Learn More

StyleGAN3

StyleGAN3 is a generative adversarial network (GAN) for creating high-quality, realistic images. It can generate high-quality, diverse images with a great level of control over different aspects of the generated images, such as facial features, hair, and clothing styles.

Learn More

Audio

Audio Effects SDK

The Audio Effects SDK delivers multi-effect, low-latency audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.

Learn More

NeMo

NVIDIA NeMo is an open-source framework for developers to build and train state-of-the-art conversational AI models. With NeMo, users can build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions and intelligent video assistants.

Learn More

Riva

NVIDIA Riva is a GPU-accelerated speech AI SDK for building and deploying fully customizable, real-time AI pipelines that deliver world-class accuracy in all clouds, on premises, at the edge, and on embedded devices.

Learn More

Resources: Examples of End-to-End Optimizations

Got a question? Ask through our forums and Discord channels below.

Post in Developer Forums Join Our Discord Channel