Laikh Tewari

Laikh Tewari is part of the AI Platform Software group at NVIDIA where he manages products for optimizing LLM inference performance. Laikh received his B.S. and M.S. in computer science from Stanford University where he specialized in systems and AI.
Avatar photo

Posts by Laikh Tewari

Generative AI

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the... 7 MIN READ
Image of the TensorRT-LLM icon next to multiple other icons of computer activities.
Generative AI

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ