TensorRT-LLM

Oct 09, 2025

NVIDIA Blackwell Leads on SemiAnalysis InferenceMAX™ v1 Benchmarks

SemiAnalysis recently launched InferenceMAX™ v1, a new open source initiative that provides a comprehensive methodology to evaluate inference hardware...

11 MIN READ

Sep 17, 2025

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits...

11 MIN READ

Aug 22, 2025

NVIDIA Hardware Innovations and Open Source Contributions Are Shaping AI

Open source AI models such as Cosmos, DeepSeek, Gemma, GPT-OSS, Llama, Nemotron, Phi, Qwen, and many more are the foundation of AI innovation. These models are...

8 MIN READ

Aug 05, 2025

NVIDIA Accelerates OpenAI gpt-oss Models Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI...

6 MIN READ

Jul 07, 2025

LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...

11 MIN READ

Jun 25, 2025

Check Out Sovereign AI in Practice Through an NVIDIA Webinar

Join NVIDIA experts and leading European model builders on July 8 for a webinar on building and deploying multilingual large language models.

1 MIN READ

Jun 25, 2025

How to Streamline Complex LLM Workflows Using NVIDIA NeMo-Skills

A typical recipe for improving LLMs involves multiple stages: synthetic data generation (SDG), model training through supervised fine-tuning (SFT) or...

10 MIN READ

Jun 24, 2025

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as...

11 MIN READ

May 30, 2025

NVIDIA Deep Learning Institute Offers Multilingual AI Training at GTC Paris

Large language models (LLMs) are capable of recognizing, summarizing, translating, predicting, and generating content. Yet even the most powerful LLMs face...

6 MIN READ

May 22, 2025

Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over...

9 MIN READ

May 02, 2025

Integrate and Deploy Tongyi Qwen3 Models into Production Applications with NVIDIA

Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models,...

7 MIN READ

Apr 24, 2025

Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. ...

7 MIN READ

Apr 02, 2025

NVIDIA Blackwell Delivers Massive Performance Leaps in MLPerf Inference v5.0

The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency...

10 MIN READ

Apr 02, 2025

LLM Inference Benchmarking: Fundamental Concepts

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...

15 MIN READ

Mar 20, 2025

Boost Llama Model Performance on Microsoft Azure AI Foundry with NVIDIA TensorRT-LLM

Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform....

4 MIN READ

Mar 18, 2025

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...

14 MIN READ