Laikh Tewari

Laikh Tewari is part of the AI Platform Software group at NVIDIA where he manages products for optimizing LLM inference performance. Laikh received his B.S. and M.S. in computer science from Stanford University where he specialized in systems and AI.

Posts by Laikh Tewari

Developer Tools & Techniques Feb 09, 2026

Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy

NVIDIA TensorRT LLM enables developers to build high-performance inference engines for large language models (LLMs), but deploying a new architecture... 9 MIN READ

Agentic AI / Generative AI Dec 16, 2025

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT LLM

For machine learning engineers deploying LLMs at scale, the equation is familiar and unforgiving: as context length increases, attention computation costs... 6 MIN READ

Developer Tools & Techniques Jul 07, 2025

Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents... 8 MIN READ

Agentic AI / Generative AI Jan 16, 2025

Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM

Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the... 7 MIN READ

Image of the TensorRT-LLM icon next to multiple other icons of computer activities.

Agentic AI / Generative AI Dec 02, 2024

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ