Slurm is a highly configurable open source workload and resource manager. In its simplest configuration, Slurm can be installed and configured in a few minutes. Use of optional plugins provides the functionality needed to satisfy the needs of demanding HPC centers with diverse job types, policies and work flows. Advanced configurations use plug-ins to provide features like accounting, resource limit management, by user or bank account, and support for sophisticated scheduling algorithms.
SchedMD is the core company behind the Slurm workload manager. Slurm is currently performing workload management on six of the ten most powerful computers in the world including the number 1 system -- Tianhe-2 with 3,120,000 computing cores – as well as number 6, the GPGPU giant Piz Daint, utilizing over 5,000 NVIDIA GPGPUs.
SchedMD performs the majority of Slurm development, reviews and integrates contributions from others, distributes and maintains the canonical version of Slurm, and finally, provides support, installation, configuration, custom development and training.
Key Features of Slurm
- Scales to millions of cores and tens of thousands of GPGPUs
- Military grade security
- Heterogenous platform support allowing users to take advantage of GPGPUs.
- Flexible plugin framework enables Slurm to meet complex customization requirements
- Topology aware job scheduling for maximum system utilization
- Open Source
- Extensive scheduling options including advanced reservations, suspend/resume, backfill, fair-share and preemptive scheduling for critical jobs
- No single point of failure
- Slurm enables new artificial intelligence (AI) capabilities to address some of the most challenging priorities on the largest AI systems in the world.