Explore NVIDIA AI Inference Tools and Technologies

A process diagram showing how NVIDIA AI Inference works

Click to Enlarge

How AI Inference Works

AI inference follows induction, also known as training. Induction is the process of creating models by inducing algorithms like neural networks with labeled data. The model learns to predict expected outcomes by learning and generalizing patterns in the labeled training data. Then the model is tested and validated on unseen data to ensure its quality. Once the model passes testing, it can be used in production for inference. Inference is the process of providing unlabeled data to a model, and the model returns information or a label for the input data. There are many types of applications for inference, like LLMs, forecasts, and predictive analysis. At its core, all inference in neural networks is inputting numbers and outputting numbers. Processing of data before and after inference is what differentiates the types of inference. For example, in an LLM, a prompt has to be turned into numbers for the input, and the output numbers have to be turned into words.

Explore AI Inference Software, Tools, and Technologies

NVIDIA NIM

NVIDIA NIM™ provides easy-to-use microservices for secure, reliable deployment of high-performance AI inferencing across the cloud, data center, and workstations.

Get Started With NIM

NVIDIA Dynamo

NVIDIA Dynamo is an open-source, low-latency inference framework for serving generative AI models in distributed environments. It scales inference workloads across large GPU fleets with optimized resource scheduling, memory management, and data transfer, and it supports all major AI inference backends.

Get Started With
NVIDIA Dynamo

NVIDIA TensorRT

NVIDIA TensorRT™ includes an inference runtime and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes TensorRT, TensorRT-LLM, TensorRT Model Optimizer, and TensorRT Cloud.

Get Started With TensorRT

NVIDIA DGX Cloud Serverless Inference

NVIDIA DGX™ Cloud offers high-performance, serverless AI inference with auto-scaling, cost-efficient GPU utilization, and multi-cloud flexibility.

Get Started With
DGX Cloud Serverless Inference

AI Inference