Rudy Pei

Rudy Pei is a senior deep learning algorithm engineer at NVIDIA working on efficient large-scale LLM inference. His work focuses on Dynamo’s KV-aware router, where he develops routing and scheduling algorithms for cache-aware serving, lower latency, and better resource utilization. He also uses DynoSim and synthetic workload generation to evaluate routing ideas under realistic serving conditions before real-cluster validation.
Avatar photo

Posts by Rudy Pei

Agentic AI / Generative AI

DynoSim: Simulating the Pareto Frontier

Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker... 12 MIN READ