Nick Comly

Nick Comly leads products for inference optimization at NVIDIA. His team focuses on pushing the capabilities and performance of the NVIDIA stack for GenAI developers. Nick received his M.S. from Stanford University, where he specialized in deep learning and optimization.
Avatar photo

Posts by Nick Comly

Top Stories

Optimize AI Inference Performance with NVIDIA Full-Stack Solutions

The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing... 9 MIN READ
Top Stories

Llama 3.2 Full-Stack Optimizations Unlock High Performance on NVIDIA GPUs

Meta recently released its Llama 3.2 series of vision language models (VLMs), which come in 11B parameter and 90B parameter variants. These models are... 6 MIN READ
Data Center / Cloud

Streamlining AI Inference Performance and Deployment with NVIDIA TensorRT-LLM Chunked Prefill

In this blog post, we take a closer look at chunked prefill, a feature of NVIDIA TensorRT-LLM that increases GPU utilization and simplifies the deployment... 4 MIN READ
NVIDIA H100.
Generative AI

5x Faster Time to First Token with NVIDIA TensorRT-LLM KV Cache Early Reuse

In our previous blog post, we demonstrated how reusing the key-value (KV) cache by offloading it to CPU memory can accelerate time to first token (TTFT) by up... 5 MIN READ
Image of an HGX H200
Generative AI

3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot

Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input... 5 MIN READ
Generative AI

NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models

Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing... 7 MIN READ