Posts by Laikh Tewari
Development & Optimization
Jul 07, 2025
Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users
Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents...
8 MIN READ
Generative AI
Jan 16, 2025
Introducing New KV Cache Reuse Optimizations in NVIDIA TensorRT-LLM
Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the...
7 MIN READ
Generative AI
Dec 02, 2024
TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x
NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that...
9 MIN READ