Jun Yang

Jun Yang is a senior engineering director at NVIDIA, where he focuses on E2E AI workload optimization. Currently, he is leading the overall engineering efforts of NVIDIA TensorRT-LLM. He holds a master’s degree in Computer Architecture from the Institute of Computing Technology Chinese Academy of Sciences.
Avatar photo

Posts by Jun Yang

Data Center / Cloud

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 10 MIN READ
Data Center / Cloud

NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a... 13 MIN READ