Huizi Mao

Huizi Mao is a tech lead and senior engineer with the Deep Learning Algorithm and Software team at NVIDIA, leading the overall development of TensorRT Model Optimizer. Huizi joined NVIDIA through the acquisition of OmniML, Inc., where he was the co-founder and CTO. He received his PhD in Electrical Engineering from Stanford, and bachelor’s degree from Tsinghua University.
Avatar photo

Posts by Huizi Mao

Generative AI

How Quantization Aware Training Enables Low-Precision Accuracy Recovery

After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),... 10 MIN READ
Generative AI

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

Major open-source foundational model releases are an exciting time for the AI community, bringing unique architectural innovations and capabilities. As the... 7 MIN READ
Decorative image.
Data Center / Cloud

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,... 14 MIN READ
Generative AI

NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance

NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over... 14 MIN READ
Generative AI

Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available

In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model... 9 MIN READ