Getting Started With NVIDIA AI for Your Applications

NVIDIA offers a range of AI tools, software development kits (SDKs), and technologies that can be used to optimize and enhance applications for deployment on NVIDIA GPUs. For additional support, reach out with questions on the Developer Forums or the NVIDIA Developer Discord.

Post in Developer Forums  Join Our Discord Channel

Profile and Optimize Your Pipeline

Profile Before Optimizing

Profiling allows you to pinpoint where optimization is critical and where small changes can have a big impact. Optimizing without first understanding where it is necessary may result in little to no performance gains.

NVIDIA Nsight™ Systems is a system-wide performance analysis tool that is simple to use and allows you to visualize CPU-GPU interaction, track GPU activity, and trace GPU workloads. Creating an application trace takes only a few minutes and provides all the insights you need to determine the optimization that promises the best return on investment.

NVIDIA Nsight Systems can help you understand your performance bottlenecks
DX12 API calls as they happen chronologically in the timeline alongside render thread.s

Best Practices for Profiling and Optimizing

  • Profile in a clean environment. Close other apps that utilize resources and add noise to the application trace.​
  • Triage then optimize. What is the bottleneck? Inference, pre- or post-processing, PCIe transfers?​
  • Reprofile after every change or optimization. Fixing one bottleneck might have unforeseeable side-effects on performance.​

Download Our Best Practices Guide

Accelerate Your AI Pipeline

Choosing a Machine Learning Framework

Several factors come into play when selecting the optimal machine learning framework for deploying an AI model. Given the effort required to switch between frameworks, it’s important to ensure that the initial selection is the most appropriate for your needs. NVIDIA fully supports and recommends TensorRT and WinML for local deployment.

TensorRT is optimized for highest-performance inferencing on NVIDIA GPUs. It runs only on NVIDIA GPUs, while WinML can work with a variety of GPU hardware.

WinML TensorRT
Direct deployment path from most frameworks via ONNX
OS Support Windows Windows & Linux
Hardware Support Any GPU NVIDIA GPUs
Performance Fast Fastest

Resources for WinML

Resources for TensorRT

Resources for Generative AI

Accelerate With DirectML

Use DirectML to accelerate generative AI applications. The benefit of DirectML is that the same optimization will run on any hardware.

How to Optimize Models like Stable Diffusion With Olive

Blog: DirectML for Stable Diffusion

Accelerate With TensorRT

Leverage optimizations in TensorRT 8.6 to accelerate generative AI models, such as Stable Diffusion, Llama2, Mistral-7B, and NVGPT-8B. The benefit of TensorRT is getting the best performance out of the GPU, seamlessly be it on NVIDIAs datacenter systems, or locally on native Windows with NVIDIA RTX Systems.

How to Optimize Models like Stable Diffusion With TensorRT

Example TRT Pipeline for Stable Diffusion

Demo application that showcases the acceleration of Stable Diffusion pipeline using TensorRT

TensorRT Extension for Stable Diffusion

New Stable Diffusion Models Accelerated with NVIDIA TensorRT

Blog: Unlock Faster Image Generation in Stable Diffusion Web UI with NVIDIA TensorRT

TensorRT Toolbox for Large Language Models

RAG on Windows using TensorRT-LLM and LlamaIndex

TensorRT-LLM for Windows


SDKs provided by NVIDIA enable developers to seamlessly incorporate cutting-edge AI functionalities into their innovative applications, expanding the scope of their creativity and enhancing the overall user experience.

Video and Broadcast

Audio Effects SDK

The Audio Effects SDK delivers multi-effect, low-latency audio-quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.

Learn More

Augmented Reality SDK

The Augmented Reality SDK offers AI-powered, real-time 3D face tracking and body pose estimation based on a standard webcam feed. Developers can create unique AR effects such as overlaying 3D content on a face—driving 3D characters and virtual interactions in real time.

Learn More

Video Effects SDK

The Video Effects SDK enables AI-based visual effects that run with standard webcam input and can be easily integrated into video conference and broadcast pipelines. The underlying deep learning models are optimized with NVIDIA AI using TensorRT for high-performance inference, enabling developers to apply multiple effects in real-time applications.

Learn More

3D and Graphic Design

Audio2Face SDK

NVIDIA Omniverse™ Audio2Face beta is a reference application that simplifies animation of a 3D character to match any voice-over track, whether a user is animating characters for a game, film, real-time digital assistants, or just for fun.

Learn More


NVIDIA DLSS is a neural graphics technology that multiplies performance using AI to create entirely new frames and display higher resolution through image reconstruction—all while delivering best-in-class image quality and responsiveness.

Learn More

OptiX Ray Tracing Engine

NVIDIA OptiX™ Ray Tracing Engine is an application framework for achieving optimal ray-tracing performance on the GPU. It provides a simple, recursive, and flexible pipeline for accelerating ray-tracing algorithms, including an advanced AI denoiser. Bring the power of NVIDIA GPUs to ray-tracing applications with programmable intersection, ray generation, and shading.

Learn More


Data Loading Library

Data Loading Library

The NVIDIA Data Loading Library (DALI) is a portable, open-source library for decoding and augmenting images, videos, and speech to accelerate deep learning applications. DALI reduces latency and training time, and mitigates bottlenecks by overlapping training and preprocessing.

Learn More


StyleGAN3 is a generative adversarial network (GAN) for creating high-quality, realistic images. It can generate high-quality, diverse images with a great level of control over different aspects of the generated images, such as facial features, hair, and clothing styles.

Learn More


Audio Effects SDK

The Audio Effects SDK delivers multi-effect, low-latency audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.

Learn More


NVIDIA NeMo is an open-source framework for developers to build and train state-of-the-art conversational AI models. With NeMo, users can build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions and intelligent video assistants.

Learn More


NVIDIA Riva is a GPU-accelerated speech AI SDK for building and deploying fully customizable, real-time AI pipelines that deliver world-class accuracy in all clouds, on premises, at the edge, and on embedded devices.

Learn More

Resources: Examples of End-to-End Optimizations

Got a question? Ask through our forums and Discord channels below.

Post in Developer Forums  Join Our Discord Channel