Fagani Hajizada

Fagani Hajizada is a senior software engineer at NVIDIA, focused on scheduling and orchestration for HPC and AI workloads on large-scale GPU clusters. He leads the design and delivery of the infrastructure that powers production-scale AI systems, spanning Kubernetes-based HPC, distributed systems, observability, and developer tooling. Fagani also contributes across open source projects including Slinky, gpu-operator, kubernetes-sigs/node-feature-discover and /kubernetes-sigs/e2e-framework.
Avatar photo

Posts by Fagani Hajizada

Data Center / Cloud

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Slurm is an open source cluster management and job scheduling system for Linux. It manages job scheduling for over 65% of TOP500 systems. Most organizations... 9 MIN READ