New NVIDIA NIM Agent Blueprints now available   Get Started

AI Inference

AI inference is the process of generating outputs from a model by providing it inputs. There are numerous types of data inputs and outputs—such as images, text, or video—that are used to produce applications such as a weather forecast or a conversation with a large language model (LLM).

A process diagram showing how NVIDIA AI Inference works

Click to Enlarge

How AI Inference Works

AI inference follows induction, also known as training. Induction is the process of creating models by inducing algorithms like neural networks with labeled data. The model learns to predict expected outcomes by learning and generalizing patterns in the labeled training data. Then the model is tested and validated on unseen data to ensure its quality. Once the model passes testing, it can be used in production for inference. Inference is the process of providing unlabeled data to a model, and the model returns information or a label for the input data. There are many types of applications for inference, like LLMs, forecasts, and predictive analysis. At its core, all inference in neural networks is inputting numbers and outputting numbers. Processing of data before and after inference is what differentiates the types of inference. For example, in an LLM, a prompt has to be turned into numbers for the input, and the output numbers have to be turned into words.

Explore AI Inference Software, Tools, and Technologies

NVIDIA NIM

NVIDIA NIM™ provides easy-to-use microservices for secure, reliable deployment of high-performance AI inferencing across the cloud, data center, and workstations.

NVIDIA Triton Inference Server

Use NVIDIA Triton Inference Server™ to combine custom AI model-serving infrastructure, boost AI inferencing and prediction abilities, and simplify the creation of custom AI pipelines with pre- and post-processing steps and business logic.

NVIDIA TensorRT

NVIDIA TensorRT™ includes an inference runtime and model optimizations that deliver low latency and high throughput for production applications. The TensorRT ecosystem includes TensorRT, TensorRT-LLM, TensorRT Model Optimizer, and TensorRT Cloud.

AI Inference Learning Resources