Thor Johnsen

Thor Johnsen joined the NVIDIA deep learning frameworks team in 2018 and has worked on various TensorFlow and PyTorch projects, vision and language models and mlperf training. His latest work is in TRT-LLM and has focused on KV cache optimizations. Prior to joining NVIDIA, he worked on scientific computing applications for the energy industry.
Avatar photo

Posts by Thor Johnsen

NVIDIA H100.
Generative AI

5x Faster Time to First Token with NVIDIA TensorRT-LLM KV Cache Early Reuse

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up... 5 MIN READ