Sharan Chetlur

Sharan Chetlur is a lead engineer working on TRT-LLM. Over the past decade, he has held various roles at NVIDIA leading the development of libraries for deep learning and HPC (cuDNN and cuBLAS) as well as NVIDIA submissions to the MLPerf training benchmark. He also had a stint at an AI hardware startup managing their team of kernel developers.
Avatar photo

Posts by Sharan Chetlur

Image of the TensorRT-LLM icon next to multiple other icons of computer activities.
Generative AI

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ
Data Center / Cloud

Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment... 4 MIN READ