Kaiyu Xie

Kaiyu Xie is a senior architect at NVIDIA who has been working on TensorRT-LLM, focusing on general performance optimization and system implementation.

Posts by Kaiyu Xie

Data Center / Cloud Oct 20, 2025

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 11 MIN READ