Fan Yu

Fan Yu is an AI Developer Technology Engineer at NVIDIA, where he engages in the development of NVIDIA Merlin HugeCTR embedding cache and other NVIDIA Merlin components. His work mainly focuses on performance optimization of various HPC and AI workload across all the NVIDIA architectures and platforms. Fan holds a master’s degree in computer science from Australian National University, where he researched computer system architecture and performance optimization on supercomputers.

Posts by Fan Yu

Networking / Communications Feb 02, 2026

Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel

In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all,... 11 MIN READ

Data Science Aug 31, 2022

Scaling Recommendation System Inference with NVIDIA Merlin Hierarchical Parameter Server

Recommendation systems are widely used today to personalize user experiences and improve customer engagement in various settings like e-commerce, social media,... 11 MIN READ