Wei-Ming Chen

Wei-Ming Chen is a senior engineer on the Deep Learning Algorithm and Software team at NVIDIA, specializing in efficient deep learning and model deployment. Prior to joining NVIDIA, he was a postdoctoral associate at MIT working with Prof. Song Han. Wei-Ming received his PhD and master’s and bachelor’s degrees in Computer Science from National Taiwan University.

Posts by Wei-Ming Chen

Data Center / Cloud Dec 08, 2025

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory... 10 MIN READ

Data Center / Cloud Aug 01, 2025

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,... 14 MIN READ

Agentic AI / Generative AI Aug 15, 2024

NVIDIA TensorRT Model Optimizer v0.15 Boosts Inference Performance and Expands Model Support

NVIDIA has announced the latest v0.15 release of NVIDIA TensorRT Model Optimizer, a state-of-the-art quantization toolkit of model optimization techniques... 5 MIN READ