Fernando Xiong

Fernando Xiong is a senior architect in the Compute Architecture group at NVIDIA, focusing on speculative decoding, performance optimization for LLM inference, and AI agent systems for software engineering. Fernando received his master’s degree in Computer Science from Renmin University of China.

Posts by Fernando Xiong

Data Center / Cloud Jun 23, 2026

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

As AI systems move from single-turn interactions to coordinated multiagent workflows, low-latency inference becomes increasingly important. Autoregressive LLMs... 7 MIN READ