Ekin Karabulut

Ekin Karabulut is a data scientist and developer advocate previously at Run:ai, now at NVIDIA, exploring the efficient usage of large models in different production scenarios. Previously she worked on privacy implications of federated learning, focused on distributed training techniques and got fascinated by inefficiencies in GPU usage in research and industry settings. She established the AI Infrastructure Club and is based in Munich, Germany.
Avatar photo

Posts by Ekin Karabulut

Data Center / Cloud

Deploying Disaggregated LLM Inference Workloads on Kubernetes

As large language model (LLM) inference workloads grow in complexity, a single monolithic serving process starts to hit its limits. Prefill and decode stages... 14 MIN READ
Data Center / Cloud

Maximizing GPU Utilization with NVIDIA Run:ai and NVIDIA NIM

Organizations deploying LLMs are challenged by inference workloads with different resource requirements. A small embedding model might use only a few gigabytes... 11 MIN READ
Data Center / Cloud

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai

As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential. NVIDIA Run:ai addresses these challenges... 13 MIN READ
Data Center / Cloud

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to... 11 MIN READ
Agentic AI / Generative AI

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now... 10 MIN READ
Decorative image.
Agentic AI / Generative AI

Enable Gang Scheduling and Workload Prioritization in Ray with NVIDIA KAI Scheduler

NVIDIA KAI Scheduler is now natively integrated with KubeRay, bringing the same scheduling engine that powers high‑demand and high-scale environments in... 10 MIN READ