Chenhan Yu

Chenhan Yu is an engineering manager at NVIDIA, working on inference and deployment system software optimization for generative AIs and autonomous driving. He received his Ph.D. in computer science from the University of Texas at Austin.
Avatar photo

Posts by Chenhan Yu

Data Center / Cloud

An Introduction to Speculative Decoding for Reducing Latency in AI Inference

Generating text with large language models (LLMs) often involves running into a fundamental bottleneck. GPUs offer massive compute, yet much of that power sits... 11 MIN READ
Illustration showing models and NeMo.
Agentic AI / Generative AI

Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer

As large language models (LLMs) are becoming even bigger, it is increasingly important to provide easy-to-use and efficient deployment paths because the cost of... 10 MIN READ