Assign the Right Amount of Compute Power to Users, Automatically

Assign the Right Amount of Compute Power to Users, Automatically Run:AI’s Kubernetes-based software platform for orchestration of containerized AI workloads enables GPU clusters to be utilized for different Deep Learning workloads dynamically - from building AI models, to training, to inference. With Run:AI, jobs at any stage get access to the compute power they need, automatically.

Run:AI’s compute management platform speeds up data science initiatives by pooling available resources and then dynamically allocating resources based on need - maximizing accessible compute.

Key Features

  • Fair-share scheduling to allow users to easily and automatically share clusters of GPUs
  • Simplified multi-GPU distributed training
  • Visibility into workloads and resource utilization to improve user productivity
  • Control for cluster admin and ops teams, to align priorities to business goals
  • On-demand access to Multi-Instance GPU (MIG) instances for the A100 GPU

Key Benefits

Advanced Kubernetes-based Scheduling Eliminates Static GPU Allocation

RunAI dashoboard

The Run:AI Scheduler manages tasks in batches using multiple queues on top of Kubernetes, allowing system admins to define different rules, policies, and requirements for each queue based on business priorities. Combined with an over-quota system and configurable fairness policies, the allocation of resources can be automated and optimized to allow maximum utilization of cluster resources.

Because it was built as a plug-in to K8s, Run:AI’s scheduler requires no advanced setup, and is certified to integrate with any number of Kubernetes “flavors” including Red Hat OpenShift and HPE Ezmeral.

No More Idle Resources

Run:AI’s over-quota system allows users to automatically access idle resources when available based on configurable fairness policies. The platform allocates resources dynamically, for full utilization of cluster resources. Our customers see improvements in utilization from around 25% when we start working with them to over 75%.

Bridge Between HPC and AI

Bridging the efficiency of High-Performance Computing and the simplicity of Kubernetes – the Run:AI Scheduler allows users to easily make use of integer GPUs, multiple-nodes of GPUs, and even GPU MIG instances, for distributed training on Kubernetes. In this way, AI workloads run based on needs, not capacity.

More Information