Chen Xiaoming

Chen Xiaoming is a principal architect and senior manager at NVIDIA, interested in algorithm/software/hardware co-design for deep learning models. He has recently been working on performance modeling, benchmarking, analysis, and optimization for large language model inference.
Avatar photo

Posts by Chen Xiaoming

Data Center / Cloud

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 10 MIN READ