Posts by Ekin Karabulut
Data Center / Cloud
Feb 27, 2026
Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM
Organizations deploying LLMs are challenged by inference workloads with different resource requirements. A small embedding model might use only a few gigabytes...
11 MIN READ
Data Center / Cloud
Feb 18, 2026
Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai
As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. NVIDIA Run:ai addresses these challenges...
13 MIN READ
Data Center / Cloud
Jan 28, 2026
Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare
NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to...
11 MIN READ
Agentic AI / Generative AI
Nov 10, 2025
Streamline Complex AI Inference on Kubernetes with NVIDIA Grove
Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now...
10 MIN READ
Agentic AI / Generative AI
Oct 03, 2025
Enable Gang Scheduling and Workload Prioritization in Ray with NVIDIA KAI Scheduler
NVIDIA KAI Scheduler is now natively integrated with KubeRay, bringing the same scheduling engine that powers high‑demand and high-scale environments in...
10 MIN READ
Data Center / Cloud
Sep 29, 2025
Smart Multi-Node Scheduling for Fast and Efficient LLM Inference with NVIDIA Run:ai and NVIDIA Dynamo
The exponential growth in large language model complexity has created challenges, such as models too large for single GPUs, workloads that demand high...
9 MIN READ