Chenhan Yu

Chenhan Yu is an engineering manager at NVIDIA, working on inference and deployment system software optimization for generative AIs and autonomous driving. He received his Ph.D. in computer science from the University of Texas at Austin.

Posts by Chenhan Yu

Data Center / Cloud Sep 17, 2025

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits... 11 MIN READ

Agentic AI / Generative AI Sep 10, 2024

Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer

As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of... 10 MIN READ