Brian Slechta

Brian Slechta is a director of AI architecture in the GPU Architecture group at NVIDIA. He is passionate about pushing the boundaries of hardware and software performance in the data center for large scale AI workloads. Brian holds an M.Sc. in computer systems engineering from the University of Illinois at Urbana-Champaign.

Posts by Brian Slechta

Data Center / Cloud Oct 09, 2024

Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch

The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of... 8 MIN READ

Agentic AI / Generative AI Sep 26, 2024

Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance

Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding... 8 MIN READ

Agentic AI / Generative AI Sep 05, 2024

Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that... 5 MIN READ

Agentic AI / Generative AI Aug 12, 2024

NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference

Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements... 8 MIN READ

Data Center / Cloud Jun 12, 2024

Demystifying AI Inference Deployments for Trillion Parameter Large Language Models

AI is transforming every industry, addressing grand human scientific challenges such as precision drug discovery and the development of autonomous vehicles, as... 14 MIN READ