Brian Slechta

Brian Slechta is a director of AI architecture in the GPU Architecture group at NVIDIA. He is passionate about pushing the boundaries of hardware and software performance in the data center for large scale AI workloads. Brian holds an M.Sc. in computer systems engineering from the University of Illinois at Urbana-Champaign.
Avatar photo

Posts by Brian Slechta

Image of an HGX H200
Generative AI / LLMs

Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch

As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that... 5 MIN READ
Decorative image of linked modules.
Generative AI / LLMs

NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference

Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements... 8 MIN READ
Decorative image.
Data Center / Cloud

Demystifying AI Inference Deployments for Trillion Parameter Large Language Models

AI is transforming every industry, addressing grand human scientific challenges such as precision drug discovery and the development of autonomous vehicles, as... 14 MIN READ