Jun Yang

Jun Yang is a senior engineering director at NVIDIA, where he focuses on E2E AI workload optimization. Currently, he is leading the overall engineering efforts of NVIDIA TensorRT-LLM. He holds a master’s degree in Computer Architecture from the Institute of Computing Technology Chinese Academy of Sciences.

Posts by Jun Yang

Data Center / Cloud Oct 20, 2025

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 11 MIN READ

Data Center / Cloud Aug 28, 2024

NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a... 13 MIN READ