DEVELOPER BLOG

Ahmed Al-Sudani

Ahmed Al-Sudani is a software engineer on the DCGM team at NVIDIA. He works on enabling health and performance monitoring in data center environments.

Posts by Ahmed Al-Sudani

AI / Deep Learning

Monitoring GPUs in Kubernetes with DCGM

Monitoring GPUs is critical for infrastructure or site reliability engineering (SRE) teams who manage large-scale GPU clusters for AI or HPC workloads. 12 MIN READ