IBM® Spectrum LSF is a complete workload management solution for demanding HPC environments. Featuring intelligent, policy-driven scheduling, it helps organizations to improve competitiveness by accelerating research and design while controlling costs through superior resource utilization and ease of use. Building on over 28 years of experience, IBM Spectrum LSF features a highly scalable and available architecture designed to address the challenge of aligning compute resources with business priorities.

IBM Spectrum LSF provides support for heterogeneous computing environments, including NVIDIA GPUs. With the ability to detect, monitor and schedule GPU enabled workloads to the appropriate resources, IBM Spectrum LSF enables users to easily take advantage of the benefits provided by GPUs.




Solution highlights include:

  • Driven Scheduling: Rich scheduling policies allow automated prioritization of workloads with service level management, enabling the right work to be run at the right time for the right person.
  • GPU Management: LSF supports automatic detection and configuration of NVIDIA GPU resources simplifying deployment, especially on cloud resources. Workloads are automatically contained within Linux control groups ensuring they only use the assigned resources.
  • GPU Scheduling: LSF automatically switches the mode of the GPU to that required for the job, and fully leverages NVIDIA MIG, DCGM and MPS. LSF can now dynamically reconfigure MIG on A100 to match the workload demands.
  • Cloud Ready: LSF supports hybrid cloud with automated cloud bursting allowing your environment to scale on demand. Intelligent data staging ensures that your data is available before launching workload or creating new cloud instances.
  • Containerized execution: LSF enables unprivileged container support for Docker, podman, Singularity, Shifter, Enroot and Kubernetes, mixing containerized and non containerized workloads, and hiding the complexity of container management from end users.
  • Access Anywhere: A powerful and customizable web portal with a mobile client and REST API allows users to securely access the environment from anywhere.
  • Production Proven: Powering many of the world’s largest clusters in semiconductor and GPU design, health care and life sciences, automotive and aerospace - whether you are running on one node or thousands, running one job or ten million jobs, LSF grows with your needs.

Resources:

Web: IBM Spectrum LSF

Blog: Dynamic management of NVIDIA DGX A100 with IBM Spectrum LSF