Carl (Izzy) Putterman

Carl (Izzy) Putterman is a senior deep learning algorithms engineer. He graduated from the University of California, Berkeley in 2021 with BAs in applied mathematics and computer science. With NVIDIA, he currently works on NIM LLM Performance and has previously worked on time series modeling and graph neural networks.

Posts by Carl (Izzy) Putterman

Agentic AI / Generative AI Dec 17, 2024

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta's Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only... 8 MIN READ

Image of the TensorRT-LLM icon next to multiple other icons of computer activities.

Agentic AI / Generative AI Dec 02, 2024

TensorRT-LLM Speculative Decoding Boosts Inference Throughput by up to 3.6x

NVIDIA TensorRT-LLM support for speculative decoding now provides over 3x the speedup in total token throughput. TensorRT-LLM is an open-source library that... 9 MIN READ

Data Center / Cloud Nov 15, 2024

NVIDIA NIM 1.4 Ready to Deploy with 2.4x Faster Inference

The demand for ready-to-deploy high-performance inference is growing as generative AI reshapes industries. NVIDIA NIM provides production-ready microservice... 3 MIN READ

Computer Vision / Video Analytics Aug 30, 2022

Beating SOTA Inference Performance on NVIDIA GPUs with GPUNet

Crafted by AI for AI, GPUNet is a class of convolutional neural networks designed to maximize the performance of NVIDIA GPUs using NVIDIA TensorRT. Built using... 6 MIN READ

Data Science Feb 15, 2022

Time Series Forecasting with the NVIDIA Time Series Prediction Platform and Triton Inference Server

In this post, we detail the recently released NVIDIA Time Series Prediction Platform (TSPP), a tool designed to compare easily and experiment with arbitrary... 13 MIN READ