Brian Pharris

Brian is a principal architect in the Compute Architecture group at NVIDIA, where his most recent focus is GPU-accelerated deep learning inference. He holds BS and MEng degrees in Electrical Engineering and Computer Science from MIT.
Avatar photo

Posts by Brian Pharris

Image of an HGX H200
Generative AI

3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input... 5 MIN READ
Generative AI

Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding... 8 MIN READ
Image of an HGX H200
Generative AI

Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that... 5 MIN READ
Simulation / Modeling / Design

Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA

Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in... 14 MIN READ
Simulation / Modeling / Design

Nv-Wavenet: Better Speech Synthesis Using GPU-Enabled WaveNet Inference

WaveNets represent an exciting new neural network architecture used to generate raw audio waveforms, including the ability to synthesize very high quality... 10 MIN READ