Kai Xu

Kai Xu is a senior engineer with the Deep Learning Algorithm and Software team at NVIDIA, specializing in optimizing inference efficiency for generative AI. He was an early engineer at OmniML prior to its acquisition by NVIDIA. He received his Ph.D. in Computer Engineering from Arizona State University.
Avatar photo

Posts by Kai Xu

Agentic AI / Generative AI

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

For machine learning engineers deploying LLMs at scale, the equation is familiar and unforgiving: as context length increases, attention computation costs... 6 MIN READ