Fernando Xiong

Fernando Xiong is a senior architect in the Compute Architecture group at NVIDIA, focusing on speculative decoding, performance optimization for LLM inference, and AI agent systems for software engineering. Fernando received his master’s degree in Computer Science from Renmin University of China.
Avatar photo

Posts by Fernando Xiong

Data Center / Cloud

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

As AI systems move from single-turn interactions to coordinated multiagent workflows, low-latency inference becomes increasingly important. Autoregressive LLMs... 7 MIN READ