Posts by Brian Pharris
Generative AI
Nov 01, 2024
3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...
5 MIN READ
Generative AI
Sep 26, 2024
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance
Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...
8 MIN READ
Generative AI
Sep 05, 2024
Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch
As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that...
5 MIN READ
Simulation / Modeling / Design
Sep 08, 2022
Full-Stack Innovation Fuels Highest MLPerf Inference 2.1 Results for NVIDIA
Today’s AI-powered applications are enabling richer experiences, fueled by both larger and more complex AI models as well as the application of many models in...
14 MIN READ
Simulation / Modeling / Design
Apr 23, 2018
Nv-Wavenet: Better Speech Synthesis Using GPU-Enabled WaveNet Inference
WaveNets represent an exciting new neural network architecture used to generate raw audio waveforms, including the ability to synthesize very high quality...
10 MIN READ