John Thomson

John Thomson is an intern on the Deep Learning Algorithms team at NVIDIA. He’s currently in his third year of Computer Engineering at the University of Waterloo. His area of focus is optimizing LLM inference on structured workloads.
Avatar photo

Posts by John Thomson

Generative AI

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the... 7 MIN READ