Posts by Huizi Mao
Agentic AI / Generative AI
Sep 11, 2025
How Quantization Aware Training Enables Low-Precision Accuracy Recovery
After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),...
10 MIN READ
Agentic AI / Generative AI
Aug 29, 2025
Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training
Major open-source foundational model releases are an exciting time for the AI community, bringing unique architectural innovations and capabilities. As the...
7 MIN READ
Data Center / Cloud
Aug 01, 2025
Optimizing LLMs for Performance and Accuracy with Post-Training Quantization
Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,...
14 MIN READ
Agentic AI / Generative AI
Mar 18, 2025
NVIDIA Blackwell Delivers World-Record DeepSeek-R1 Inference Performance
NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over...
14 MIN READ
Agentic AI / Generative AI
May 08, 2024
Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available
In the fast-evolving landscape of generative AI, the demand for accelerated inference speed remains a pressing concern. With the exponential growth in model...
9 MIN READ