Posts by Neelay Shah
Generative AI
May 06, 2025
LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM
This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. ...
11 MIN READ
Generative AI
Apr 02, 2025
LLM Inference Benchmarking: Fundamental Concepts
This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM...
15 MIN READ
Development & Optimization
Mar 18, 2025
NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models
NVIDIA announced the release of NVIDIA Dynamo at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying...
14 MIN READ
Data Center / Cloud
Mar 07, 2024
Generate Stunning Images with Stable Diffusion XL on the NVIDIA AI Inference Platform
Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by...
14 MIN READ