NVIDIA cuNumeric

Bringing GPU-Accelerated Supercomputing to the NumPy Ecosystem

Python has become the most widely used language for data science, machine learning, and productive numerical computing. NumPy is the de facto standard math and matrix library, providing a simple and easy-to-use programming model whose interfaces correspond closely to the mathematical needs of scientific applications, making it the foundation upon which many of the most widely used data science and machine learning programming environments are constructed.

As datasets continue to expand in size and programs continue to increase in complexity, there’s a growing need to solve these problems by harnessing computational resources far beyond what a single CPU-only node can provide.

NVIDIA cuNumeric aspires to be a drop-in replacement library for NumPy, bringing distributed and accelerated computing on the NVIDIA platform to the Python community. Download the Alpha release of cuNumeric today.

Download Now


Legate is an abstraction layer which runs on top of a runtime system, together providing scalable implementations of popular domain-specific APIs. It provides an API similar to Apache Arrow but provides stronger guarantees about data coherence and synchronization to aid library developers. NVIDIA cuNumeric layers on top of Legate like other libraries.

Legate democratizes computing by making it possible for all programmers to leverage the power of large clusters of CPUs and GPUs by running the same code that runs on a desktop or a laptop at scale. Using this technology, computational and data scientists can develop and test programs on moderately sized datasets on local machines and then immediately scale up to larger datasets deployed on many nodes in the cloud or on a supercomputer without any code modifications.

Getting Started on Github

Key Benefits

The NVIDIA cuNumeric library on Legate:

  • Transparently accelerates and scales existing NumPy workflows
  • Scales to up to thousands of GPUs optimally
  • Requires zero code changes to ensure developer productivity
  • Is freely available. Get started on GitHub or Conda

cuNumeric Performance

Weak Scaling of Richard-Lucy Deconvolution on DGX SuperPOD

Weak Scaling of Richard-Lucy Devonvolution on NVIDIA DGX SuperPOD

Processing 10TB Microscopy Image Data as a Single NumPy Array

This multi-view lattice light-sheet microscopy example produces tens of terabytes (TB) of raw image data per day. Up until now, all processing has happened offline, after all the data has been collected. By moving all the preprocessing and reconstruction operations to GPUs and using cuNumeric on Legate, the data can be visualized in real time as it’s processed.

Get Started with cuNumeric today.