Eduardo Alvarez

Eduardo Alvarez is a senior technical lead at NVIDIA, where he focuses on AI inference at scale, performance optimization, workload economic analysis, and application enablement. He has a deep background in AI systems engineering, workload optimization, and accelerated computing—focused on translating innovations into real-world applications. Before NVIDIA, Eduardo held engineering roles at various semiconductor and energy tech companies.
Avatar photo

Posts by Eduardo Alvarez

Generative AI

How Quantization Aware Training Enables Low-Precision Accuracy Recovery

After training AI models, a variety of compression techniques can be used to optimize them for deployment. The most common is post-training quantization (PTQ),... 10 MIN READ
Rendering of Rubin CPX.
Generative AI

NVIDIA Rubin CPX Accelerates Inference Performance and Efficiency for 1M+ Token Context Workloads

Inference has emerged as the new frontier of complexity in AI. Modern models are evolving into agentic systems capable of multi-step reasoning, persistent... 5 MIN READ
Generative AI

Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training

Major open-source foundational model releases are an exciting time for the AI community, bringing unique architectural innovations and capabilities. As the... 7 MIN READ
Decorative image.
Data Center / Cloud

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,... 14 MIN READ
Data Center / Cloud

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as... 11 MIN READ