AI Inference

Decorative image of a person looking at a chatbot.

Apr 03, 2024

New Lab: Generative AI Inference with NVIDIA NIM

Get started with NVIDIA NIM for deploying large language models (LLMs). Request access to a free, hands-on lab today.

1 MIN READ

Apr 02, 2024

Tune and Deploy LoRA LLMs with NVIDIA TensorRT-LLM

Large language models (LLMs) have revolutionized natural language processing (NLP) with their ability to learn from massive amounts of text and generate fluent...

15 MIN READ

An image of an NVIDIA H200 Tensor Core GPU.

Mar 27, 2024

NVIDIA H200 Tensor Core GPUs and NVIDIA TensorRT-LLM Set MLPerf LLM Inference Records

Generative AI is unlocking new computing applications that greatly augment human capability, enabled by continued model innovation. Generative AI...

11 MIN READ

An image of the GB200 NVL72 and NVLink spine.

Mar 18, 2024

NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference

What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for:...

9 MIN READ

Four images compared against three modes for quality.

Mar 07, 2024

NVIDIA TensorRT Accelerates Stable Diffusion Nearly 2x Faster with 8-bit Post-Training Quantization

In the dynamic realm of generative AI, diffusion models stand out as the most powerful architecture for generating high-quality images with text prompts. Models...

7 MIN READ

An illustration representing LLM optimization.

Feb 21, 2024

NVIDIA TensorRT-LLM Revs Up Inference for Google Gemma

NVIDIA is collaborating as a launch partner with Google in delivering Gemma, a newly optimized family of open models built from the same research and technology...

4 MIN READ

Feb 13, 2024

Top Inference for Large Language Models Sessions at NVIDIA GTC 2024

Learn how inference for LLMs is driving breakthrough performance for AI-enabled applications and services.

1 MIN READ

Feb 01, 2024

Deploy an AI Coding Assistant with NVIDIA TensorRT-LLM and NVIDIA Triton

Large language models (LLMs) have revolutionized the field of AI, creating entirely new ways of interacting with the digital world. While they provide a good...

12 MIN READ

Jan 29, 2024

Emulating the Attention Mechanism in Transformer Models with a Fully Convolutional Network

The past decade has seen a remarkable surge in the adoption of deep learning techniques for computer vision (CV) tasks. Convolutional neural networks (CNNs)...

13 MIN READ

Jan 11, 2024

Free Digital Webinar Series: How to Get Started with AI Inference

Learn how to improve your AI model performance with this series of expert-led talks on the NVIDIA AI inference platform.

1 MIN READ

Jan 08, 2024

Contest: Build Generative AI on NVIDIA RTX PCs

NVIDIA is announcing the Generative AI on RTX PCs Developer Contest - designed to inspire innovation within the developer community. Build and submit your next...

1 MIN READ

Decorative image of an open laptop with a lightbulb leaning on it, on a purple background.

Jan 08, 2024

Supercharging LLM Applications on Windows PCs with NVIDIA RTX Systems

Large language models (LLMs) are fundamentally changing the way we interact with computers. These models are being incorporated into a wide range of...

5 MIN READ

Jan 08, 2024

Get Started with Generative AI Development for Windows PCs with NVIDIA RTX

Generative AI and large language models (LLMs) are changing human-computer interaction as we know it. Many use cases would benefit from running LLMs locally on...

4 MIN READ

Jan 04, 2024

Accelerating Inference on End-to-End Workflows with H2O.ai and NVIDIA

Data scientists are combining generative AI and predictive analytics to build the next generation of AI applications. In financial services, AI modeling and...

14 MIN READ

Dec 14, 2023

Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM

Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released...

4 MIN READ

Dec 14, 2023

Generative AI Research Spotlight: Demystifying Diffusion-Based Models

With Internet-scale data, the computational demands of AI-generated content have grown significantly, with data centers running full steam for weeks or months...

26 MIN READ