Bright Computing provides comprehensive software solutions for deploying and managing enterprise-grade Linux clusters. Bright provisions, monitors and manages GPU clusters, and makes it an ongoing practice to incorporate the latest enhancements in NVIDIA GPU technology into its products, enabling Bright customers to seamlessly use NVIDIA technology.

Key Features

  • Intuitive web interface provides comprehensive view of GPU and cluster metrics
  • Powerful Cluster Management Shell (CMSH) as alternative administrative interface
  • Full support for NVIDIA libraries, CUDA, OpenCL, OpenACC, CUDA-aware libraries, NCCL, and CUB
  • Comprehensive GPU monitoring and health checking
  • Provisions GPU resources from public (AWS, Azure) and private (OpenStack) clouds within minutes
  • Auto scaling hybrid cloud based on workload and configured policies
  • Supports several popular Linux distributions: RHEL and derivatives, SUSE SLES and Ubuntu LTS
  • GPU-enabled Kubernetes and Singularity for running containers
  • Offers a complete machine learning stack
  • Deployment for popular HPC file systems and management of fast interconnects
  • Gives insight into how jobs utilize the resources that were allocated (e.g. GPU usage throughout a job’s runtime)
  • Provides reporting to administrators to identify users or groups of users that are consistently underutilizing the resources (e.g. GPUs) that they allocate for their jobs.

Bright Cluster Manager can sample and monitor metrics from supported GPUs and GPU Computing Systems, such as the NVIDIA Tesla A100, V100, P100, and T4 GPUs as well as commodity GPUs.

Examples of supported metrics include:

  • GPU temperatures
  • GPU exclusivity modes
  • GPU fan speeds
  • System fan speeds
  • PSU voltages and currents
  • System LED states
  • GPU ECC statistics
  • GPU utilization
  • GPU memory usage

Bright Cluster Manager leverages NVIDIA’s Data Center GPU Manager (DCGM) for GPU health monitoring, diagnostics and validation

Key benefits:

  • Unprecedented ease of use
  • Significant cost and time savings
  • Increased uptime and productivity
  • Scalability up to 100,000 compute nodes
  • Significant cost savings through dynamic scaling

Bright for Data Science

Bright empowers organizations to gain actionable insights from rich, complex data. To achieve this, Bright offers a comprehensive deep learning solution that includes:

  • A modern deep learning environment - Bright provides everything needed to spin up an effective deep learning environment, and manage it effectively
  • Choice of machine learning frameworks - Bright Cluster Manager provides a choice of machine learning frameworks, including Tensorflow, Tensorflow2, Horovod, Keras, PyTorch, Chainer, fast.ai, DyNet, MXNet, Theano to simplify deep learning projects.
  • Choice of machine learning libraries - Bright includes a selection of the most popular machine learning libraries and tools to help access datasets, including cuDNN, ONNX, and TensorRT.
  • Frameworks are provided for Python 3.7 and for multiple versions of CUDA
  • Supporting infrastructure elements – Bright takes care of finding, configuring, and deploying all of the dependent pieces needed to run deep learning libraries and frameworks, and includes over 400MB of Python modules that support the machine learning packages, plus the NVIDIA hardware drivers, CUDA (parallel computing platform API) drivers, CUB (CUDA building blocks), and NCCL2 (library of standard collective communication routines)
  • Because the machine learning frameworks and libraries are constantly being updated, please see the Bright packages dashboard for the most up to date information

For more information: