Nick Comly

Nick Comly leads products for inference optimization at NVIDIA. His team focuses on pushing the capabilities and performance of the NVIDIA stack for GenAI developers. Nick received his M.S. from Stanford University, where he specialized in deep learning and optimization.

Posts by Nick Comly

Developer Tools & Techniques Jun 13, 2025

Run High-Performance LLM Inference Kernels from NVIDIA Using FlashInfer

Best-in-class LLM Inference requires two key elements: speed and developer velocity. Speed refers to maximizing the efficiency of the underlying hardware by... 6 MIN READ

Optimize AI Inference Performance with NVIDIA Full-Stack Solutions

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing... 9 MIN READ

Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are... 6 MIN READ

Data Center / Cloud Nov 15, 2024

Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment... 4 MIN READ

Agentic AI / Generative AI Nov 08, 2024

5x Faster Time to First Token with NVIDIA TensorRT-LLM KV Cache Early Reuse

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up... 5 MIN READ

Agentic AI / Generative AI Nov 01, 2024

3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input... 5 MIN READ