Kaiyu Xie

Kaiyu Xie is a senior architect at NVIDIA who has been working on TensorRT-LLM, focusing on general performance optimization and system implementation.
Avatar photo

Posts by Kaiyu Xie

Data Center / Cloud

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 10 MIN READ