Nick Comly

Nick Comly is the product manager for Deep Learning Inference at NVIDIA. He works to bring the power of TensorRT inference optimizations directly to frameworks like PyTorch, TensorFlow, MXNet, ONNX, and PaddlePaddle. Nick received his M.S. from Stanford University, where he specialized in deep learning and optimizations.
Avatar photo

Posts by Nick Comly

An illustration showing the steps "LLM" then "Optimize" then "Deploy."
Generative AI / LLMs

NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute... 5 MIN READ
Stylized image of a workflow, with nodes labelled LLM, Optimize, and Deploy.
Generative AI / LLMs

Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available

Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source... 10 MIN READ
TensorRTLLM illustration.
Top Stories

NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs

Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique... 9 MIN READ
Data Science

Optimizing and Serving Models with NVIDIA TensorRT and NVIDIA Triton

Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of your choice, are satisfied with its accuracy, and are considering... 11 MIN READ