Dongxu Yang

Dongxu Yang is a principal architect at NVIDIA, working on parallel computing and performance optimization. He has recently focused on development and optimization for large language model inference. He eceived his B.S. and M.S. degrees from Tsinghua University, China in 2008 and 2011, respectively.

Posts by Dongxu Yang

Data Center / Cloud Oct 20, 2025

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 11 MIN READ

Decorative image of graphs as light web.

Data Science Apr 03, 2024

Optimizing Memory and Retrieval for Graph Neural Networks with WholeGraph, Part 2

Large-scale graph neural network (GNN) training presents formidable challenges, particularly concerning the scale and complexity of graph data. These challenges... 5 MIN READ

An illustration representing WholeGraph.

Data Science Mar 08, 2024

Optimizing Memory and Retrieval for Graph Neural Networks with WholeGraph, Part 1

Graph neural networks (GNNs) have revolutionized machine learning for graph-structured data. Unlike traditional neural networks, GNNs are good at capturing... 9 MIN READ