Eduardo Alvarez

Eduardo Alvarez is a senior technical lead at NVIDIA, where he focuses on AI inference at scale, performance optimization, workload economic analysis, and application enablement. He has a deep background in AI systems engineering, workload optimization, and accelerated computing—focused on translating innovations into real-world applications. Before NVIDIA, Eduardo held engineering roles at various semiconductor and energy tech companies.
Avatar photo

Posts by Eduardo Alvarez

Decorative image.
Data Center / Cloud

Unlock Exascale Performance on NVIDIA GB200 NVL72 with Slurm Topology-Aware Job Scheduling

As AI models grow in scale and complexity, realizing the full performance of modern accelerated infrastructure depends as much on how workloads are placed as on... 10 MIN READ
Agentic AI / Generative AI

How the NVIDIA Vera Rubin Platform is Solving Agentic AI’s Scale-Up Problem

Agentic inference has fundamentally changed the runtime dynamics of inference workloads by introducing non-deterministic trajectories—actions, observations,... 8 MIN READ
Agentic AI / Generative AI

Building for the Rising Complexity of Agentic Systems with Extreme Co-Design

Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different.  Agents don't... 12 MIN READ
Agentic AI / Generative AI

Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

DeepSeek just launched its fourth generation of flagship models with DeepSeek-V4-Pro and DeepSeek-V4-Flash, both targeted at enabling highly efficient... 6 MIN READ
Data Center / Cloud

Top 5 AI Model Optimization Techniques for Faster, Smarter Inference

As AI models get larger and architectures more complex, researchers and engineers are continuously finding new techniques to optimize the performance and... 6 MIN READ
Data Center / Cloud

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory... 10 MIN READ