John Thomson

John Thomson is an intern on the Deep Learning Algorithms team at NVIDIA. He’s currently in his third year of Computer Engineering at the University of Waterloo. His area of focus is optimizing LLM inference on structured workloads.

Posts by John Thomson

Agentic AI / Generative AI Jan 16, 2025

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the... 7 MIN READ