Gargi Prasad

Gargi Prasad is the program lead for resilience at NVIDIA in DGX Cloud. Her main focus areas are AI infrastructure resilience and performance optimization. Prior to NVIDIA, Gargi worked at Meta in the Core Infra serving large scale distributed systems. She has expertise in Software/System Engineering and Architecture and has worked for 15+ years in the industry. Gargi has a master’s degree in Computer Science from Delft University of Technology with a specialization in Parallel & Distributed Systems.
Avatar photo

Posts by Gargi Prasad

Networking / Communications

Enhancing Communication Observability of AI Workloads with NCCL Inspector

When using the NVIDIA Collective Communication Library (NCCL) to run a deep learning training or inference workload that uses collective operations (such as... 6 MIN READ