Bright Cluster Manager

Base Command Manager (BCM) provides a comprehensive software solutions for deploying and managing enterprise-grade Linux clusters. BCM provisions, monitors and manages GPU clusters, and makes it an ongoing practice to incorporate the latest enhancements in NVIDIA GPU technology into its products, enabling BCM customers to seamlessly use NVIDIA technology.

Key Features

Intuitive web interface provides comprehensive view of GPU and cluster metrics
Powerful Cluster Management Shell (CMSH) as alternative administrative interface
Full support for NVIDIA libraries, CUDA, OpenCL, OpenACC, CUDA-aware libraries, NCCL, and CUB
Comprehensive GPU monitoring and health checking
Provisions GPU resources from public (AWS, Azure) and private (OpenStack) clouds within minutes
Auto scaling hybrid cloud based on workload and configured policies
Supports several popular Linux distributions: RHEL and derivatives, SUSE SLES and Ubuntu LTS
GPU-enabled Kubernetes and Singularity for running containers
Offers a complete machine learning stack
Deployment for popular HPC file systems and management of fast interconnects
Gives insight into how jobs utilize the resources that were allocated (e.g. GPU usage throughout a job’s runtime)
Provides reporting to administrators to identify users or groups of users that are consistently underutilizing the resources (e.g. GPUs) that they allocate for their jobs.

BCM can sample and monitor metrics from supported GPUs and GPU Computing Systems, such as the NVIDIA Hopper, Blackwell, Grace Hopper , Grace Blackwell and legacy GPU's.

Examples of supported metrics include:

GPU temperatures
GPU exclusivity modes
GPU fan speeds
System fan speeds
PSU voltages and currents
System LED states
GPU ECC statistics
GPU utilization
GPU memory usage

BCM leverages NVIDIA’s Data Center GPU Manager (DCGM) for GPU health monitoring, diagnostics and validation

Key benefits:

Unprecedented ease of use
Significant cost and time savings
Increased uptime and productivity
Scalability up to 100,000 compute nodes
Significant cost savings through dynamic scaling

BCM for Data Science

BCM empowers organizations to gain actionable insights from rich, complex data. To achieve this, BCM offers a comprehensive deep learning solution that includes:

A modern deep learning environment - BCM provides everything needed to spin up an effective deep learning environment, and manage it effectively
Choice of machine learning frameworks - BCM provides a choice of machine learning frameworks, including Tensorflow, Tensorflow2, Horovod, Keras, PyTorch, Chainer, fast.ai, DyNet, MXNet, Theano to simplify deep learning projects.
Choice of machine learning libraries - BCM includes a selection of the most popular machine learning libraries and tools to help access datasets, including cuDNN, ONNX, and TensorRT.
Frameworks are provided for Python 3.7 and for multiple versions of CUDA
Supporting infrastructure elements – BCM takes care of finding, configuring, and deploying all of the dependent pieces needed to run deep learning libraries and frameworks, and includes over 400MB of Python modules that support the machine learning packages, plus the NVIDIA hardware drivers, CUDA (parallel computing platform API) drivers, CUB (CUDA building blocks), and NCCL2 (library of standard collective communication routines)
Because the machine learning frameworks and libraries are constantly being updated, please see the BCM packages dashboard for the most up to date information

Base Command Manager

Key Features

Examples of supported metrics include:

Key benefits:

BCM for Data Science

For more information: