Managing your Cluster and Scheduling jobs on your GPU Cluster can be simple and intuitive with industry leading solutions now with NVIDIA GPU support.

Bright Cluster Manager

A totally integrated, single solution for deploying, testing, provisioning, monitoring and managing GPU clusters. With Bright Cluster Manager, a cluster administrator can easily install and manage multiple clusters simultaneously.


An open-source, scalable, distributed monitoring system for high-performance computing systems such as clusters and Grids.  It is carefully engineered to achieve very low per-node overheads and high concurrency. Ganglia is currently in use on thousands of clusters around the world and can scale to handle clusters with several thousand of nodes.

StackIQ Boss for HPC with CUDA Pallet

Build and deploy clusters that leverage NVIDIA GPUs for general purpose computing. By integrating the CUDA Pallet with StackIQ Boss for HPC, users benefit from rapid configuration, and reliable, predictable performance from their cluster thanks to the parallel Avalanche installer, database driven library, and central operator’s console.


A suite of tools for managing and monitoring Tesla™ GPUs in cluster environments.

IBM Spectrum LSF

A powerful workload management platform for demanding, distributed HPC environments. It provides a comprehensive set of intelligent, policy-driven scheduling features that enable you to utilize all of your compute infrastructure resources and ensure optimal application performance.

Altair PBS Professional

The industry-leading Altair® PBS Professional® workload manager and job scheduler is designed to improve productivity, optimize utilization and efficiency, and simplify administration for HPC clusters, clouds, and supercomputers. PBS Professional automates job scheduling, management, monitoring, and reporting, and it’s the trusted solution for complex Top500 systems as well as smaller clusters.

Grid Engine

An industry-leading distributed resource management (DRM) system used by hundreds of companies worldwide to build large compute cluster infrastructures for processing massive volumes of workload. A highly scalable and reliable DRM system, Grid Engine enables companies to produce higher-quality products, reduce time to market, and streamline and simplify the computing environment.


An open-source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance.


Slurm is a open-source workload manager designed specifically to satisfy the demanding needs of high performance computing. Slurm is in widespread use at government laboratories, universities and companies world wide. As of the November 2014 Top 500 computer list, Slurm was performing workload management on six of the ten most powerful computers in the world including the GPU giant Piz Daint, utilizing over 5,000 NVIDIA GPUs.

Looking for help with your GPU Cluster?
Get in touch with industry experts and NVIDIA engineers on the CUDA Developer forums