Bo Li

Bo Li is a senior DevTech Compute engineer at NVIDIA, working on accelerating AI at scale. His current focus is efficient LLM inference, spanning from low-level GPU optimization to system design. He is also experienced with generative AI modeling and computer graphics. He received his master's degree in Computer Science from ETH Zurich, and his bachelor's from Peking University.
Avatar photo

Posts by Bo Li

Agentic AI / Generative AI

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM

For machine learning engineers deploying LLMs at scale, the equation is familiar and unforgiving: as context length increases, attention computation costs... 6 MIN READ