Huizi Mao

Huizi Mao is a tech lead and senior engineer with the Deep Learning Algorithm and Software team at NVIDIA, leading the overall development of TensorRT Model Optimizer. Huizi joined NVIDIA through the acquisition of OmniML, Inc., where he was the co-founder and CTO. He received his PhD in Electrical Engineering from Stanford, and bachelor’s degree from Tsinghua University.

Posts by Huizi Mao

Data Center / Cloud Dec 08, 2025

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory... 10 MIN READ

Agentic AI / Generative AI Sep 11, 2025

How Quantization Aware Training Enables Low-Precision Accuracy Recovery

After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),... 10 MIN READ

Agentic AI / Generative AI Aug 29, 2025

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

Major open-source foundational model releases are an exciting time for the AI community, bringing unique architectural innovations and capabilities. As the... 7 MIN READ

Data Center / Cloud Aug 01, 2025

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,... 14 MIN READ

Agentic AI / Generative AI Mar 18, 2025

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over... 14 MIN READ

Agentic AI / Generative AI May 08, 2024

Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model... 9 MIN READ