Noa Neria

Noa Neria is a senior system software engineer at NVIDIA, currently focusing on building open source infrastructure for LLM inference. Her primary technology interests are LLM inference, GPU virtualization, and distributed systems. She built this expertise by creating core GPU virtualization technologies, including fractional GPUs, at Run:ai (acquired by NVIDIA) and through her development of patented distributed NAS technologies at Dell. Dr. Neria holds a PhD Summa Cum Laude in Computational Chemical Physics from Tel Aviv University.
Avatar photo

Posts by Noa Neria

AI Platforms / Deployment

Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer

Deploying large language models (LLMs) poses a challenge in optimizing inference efficiency. In particular, cold start delays—where models take significant... 13 MIN READ