Hao Guo

Hao Guo is a software engineer at NVIDIA, focusing on model optimization and LLM inference. He received his master’s degree in Computer Science from the University of Illinois at Urbana-Champaign.
Avatar photo

Posts by Hao Guo

Data Center / Cloud

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits... 11 MIN READ