Neelay Shah

Neelay Shah is the principal software architect for NVIDIA Triton Inference Server and an AI solutions engineer. His focus is on enabling developers with a smooth transition from prototyping to high-performance production deployments at scale. Before joining NVIDIA, Neelay was a principal engineer at Intel leading open source projects for computer vision pipelines. He has a bachelor’s degree in computer science from Williams College and a master’s degree in computer science from UIUC.

Posts by Neelay Shah

Decorative image of a datacenter with floating icons overlaid.

Agentic AI / Generative AI May 06, 2025

LLM Inference Benchmarking Guide: NVIDIA GenAI-Perf and NIM

This is the second post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. ... 11 MIN READ

Agentic AI / Generative AI Apr 02, 2025

LLM Inference Benchmarking: Fundamental Concepts

This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM... 15 MIN READ

Developer Tools & Techniques Mar 18, 2025

NVIDIA Dynamo, A Low-Latency Distributed Inference Framework for Scaling Reasoning AI Models

NVIDIA announced the release of NVIDIA Dynamo at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying... 14 MIN READ

Four images of products against enhanced backgrounds.

Data Center / Cloud Mar 07, 2024

Generate Stunning Images with Stable Diffusion XL on the NVIDIA AI Inference Platform

Diffusion models are transforming creative workflows across industries. These models generate stunning images based on simple text or image inputs by... 14 MIN READ