Dongxu Yang

Dongxu Yang is a principal architect at NVIDIA, working on parallel computing and performance optimization. He has recently focused on development and optimization for large language model inference. He eceived his B.S. and M.S. degrees from Tsinghua University, China in 2008 and 2011, respectively.
Avatar photo

Posts by Dongxu Yang

Data Center / Cloud

Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems

Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the... 10 MIN READ
Decorative image of graphs as light web.
Data Science

Optimizing Memory and Retrieval for Graph Neural Networks with WholeGraph, Part 2

Large-scale graph neural network (GNN) training presents formidable challenges, particularly concerning the scale and complexity of graph data. These challenges... 5 MIN READ
An illustration representing WholeGraph.
Data Science

Optimizing Memory and Retrieval for Graph Neural Networks with WholeGraph, Part 1

Graph neural networks (GNNs) have revolutionized machine learning for graph-structured data. Unlike traditional neural networks, GNNs are good at capturing... 9 MIN READ