Posts by Ekin Karabulut
Data Center / Cloud
Jan 28, 2026
Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare
NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to...
11 MIN READ
Agentic AI / Generative AI
Nov 10, 2025
Streamline Complex AI Inference on Kubernetes with NVIDIA Grove
Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now...
10 MIN READ
Agentic AI / Generative AI
Oct 03, 2025
Enable Gang Scheduling and Workload Prioritization in Ray with NVIDIA KAI Scheduler
NVIDIA KAI Scheduler is now natively integrated with KubeRay, bringing the same scheduling engine that powers high‑demand and high-scale environments in...
10 MIN READ
Data Center / Cloud
Sep 29, 2025
Smart Multi-Node Scheduling for Fast and Efficient LLM Inference with NVIDIA Run:ai and NVIDIA Dynamo
The exponential growth in large language model complexity has created challenges, such as models too large for single GPUs, workloads that demand high...
9 MIN READ
Agentic AI / Generative AI
Sep 16, 2025
Reducing Cold Start Latency for LLM Inference with NVIDIA Run:ai Model Streamer
Deploying large language models (LLMs) poses a challenge in optimizing inference efficiency. In particular, cold start delays—where models take significant...
13 MIN READ
Agentic AI / Generative AI
Sep 02, 2025
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap
Deploying large language models (LLMs) at scale presents a dual challenge: ensuring fast responsiveness during high demand, while managing the costs of GPUs....
6 MIN READ