Posts by Seonghee Lee
Developer Tools & Techniques
Jun 26, 2026
Creating the NVIDIA Nemotron 3 Ultra NVFP4 Checkpoint with NVIDIA Model Optimizer
As context windows grow longer, moving large model weights efficiently becomes critical to performance. A common way to address this is quantization, an...
16 MIN READ
Developer Tools & Techniques
Mar 09, 2026
Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer Library
Deploying large language models (LLMs) requires large-scale distributed inference, which spreads model computation and request handling across many GPUs and...
13 MIN READ
Data Center / Cloud
Feb 25, 2026
Making Softmax More Efficient with NVIDIA Blackwell Ultra
LLM context lengths are exploding, and architectures are moving toward complex attention schemes like Multi-Head Latent Attention (MLA) and Grouped Query...
10 MIN READ