Brian Pharris

Brian Pharris is a senior distinguished engineer and the technical lead for GPU-accelerated inference, shaping the architecture, performance, and scalability of some of the world's most advanced AI systems. He holds both B.S. and M.Eng. degrees in electrical engineering and computer science from MIT.

Posts by Brian Pharris

Agentic AI / Generative AI Jul 10, 2026

AI Model Co-Design: Hardware-Friendly LLM Design

AI performance comes down to three dimensions: Accuracy: How well the model reasons and produces outputs Throughput: How many tokens per second a... 17 MIN READ

Agentic AI / Generative AI Nov 01, 2024

3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input... 5 MIN READ

Agentic AI / Generative AI Sep 26, 2024

Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding... 8 MIN READ

Agentic AI / Generative AI Sep 05, 2024

Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that... 5 MIN READ

Simulation / Modeling / Design Sep 08, 2022

Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in... 14 MIN READ

Simulation / Modeling / Design Apr 23, 2018

Nv-Wavenet: Better Speech Synthesis Using GPU-Enabled WaveNet Inference

WaveNets represent an exciting new neural network architecture used to generate raw audio waveforms, including the ability to synthesize very high quality... 10 MIN READ