Sharan Chetlur

Sharan Chetlur is a lead engineer working on TRT-LLM. Over the past decade, he has held various roles at NVIDIA leading the development of libraries for deep learning and HPC (cuDNN and cuBLAS) as well as NVIDIA submissions to the MLPerf training benchmark. He also had a stint at an AI hardware startup managing their team of kernel developers.

Posts by Sharan Chetlur

Image of the TensorRT-LLM icon next to multiple other icons of computer activities.

Agentic AI / Generative AI Dec 02, 2024

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ

Data Center / Cloud Nov 15, 2024

Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment... 4 MIN READ