Nick Comly

Nick Comly is the product manager for Deep Learning Inference at NVIDIA. He works to bring the power of TensorRT inference optimizations directly to frameworks like PyTorch, TensorFlow, MXNet, ONNX, and PaddlePaddle. Nick received his M.S. from Stanford University, where he specialized in deep learning and optimizations.
Avatar photo

Posts by Nick Comly

Generative AI

Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and TensorRT-LLM

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to... 9 MIN READ
Generative AI

NVIDIA TensorRT 10.0 Upgrades Usability, Performance, and AI Model Support

NVIDIA today announced the latest release of NVIDIA TensorRT, an ecosystem of APIs for high-performance deep learning inference. TensorRT includes inference... 7 MIN READ
An illustration showing the steps "LLM" then "Optimize" then "Deploy."
Generative AI

NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200

Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute... 5 MIN READ
Stylized image of a workflow, with nodes labelled LLM, Optimize, and Deploy.
Generative AI

Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available

Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source... 10 MIN READ
TensorRTLLM illustration.
Top Stories

NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs

Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique... 9 MIN READ
Data Science

Optimizing and Serving Models with NVIDIA TensorRT and NVIDIA Triton

Imagine that you have trained your model with PyTorch, TensorFlow, or the framework of your choice, are satisfied with its accuracy, and are considering... 11 MIN READ