Bringing GPU-Accelerated Supercomputing to the NumPy Ecosystem
Python has become the most widely used language for data science, machine learning, and productive numerical computing. NumPy is the de facto standard math and matrix library, providing a simple and easy-to-use programming model whose interfaces correspond closely to the mathematical needs of scientific applications, making it the foundation upon which many of the most widely used data science and machine learning programming environments are constructed.
As datasets continue to expand in size and programs continue to increase in complexity, there’s a growing need to solve these problems by harnessing computational resources far beyond what a single CPU-only node can provide.
NVIDIA cuNumeric aspires to be a drop-in replacement library for NumPy, bringing distributed and accelerated computing on the NVIDIA platform to the Python community. Download the latest release of cuNumeric today.
Support for Accelerated Computing
NVIDIA recently announced the availability of a public Beta version of cuNumeric. cuNumeric supports all NumPy features, such as in-place updates, broadcasting, and full indexing view semantics. This means that any Python code that uses NumPy to operate on large datasets can be automatically parallelized to leverage the power of large clusters of CPUs and GPUs when switched to using cuNumeric.
Since its debut at GTC in March 2022, cuNumeric has rapidly grown to support additional NumPy functionalities, as well as, improve performance and user experience. Central to this progress is Legate, a framework for scalable and composable distributed software.
The canonical implementation of NumPy used by most programmers runs on a single CPU core and only a few operations are parallelized across cores. This restriction to single-node CPU execution limits both the size of data that can be processed and the speed with which problems can be solved.
As datasets and programs continue to increase in size and complexity, there’s a growing need to harness computational resources, far beyond what a single CPU-only node can provide. cuNumeric brings GPU-accelerated supercomputing to the NumPy ecosystem.
Legate is an abstraction layer which runs on top of a runtime system, together providing scalable implementations of popular domain-specific APIs. It provides an API similar to Apache Arrow but provides stronger guarantees about data coherence and synchronization to aid library developers. NVIDIA cuNumeric layers on top of Legate like other libraries.
Legate democratizes computing by making it possible for all programmers to leverage the power of large clusters of CPUs and GPUs by running the same code that runs on a desktop or a laptop at scale. Using this technology, computational and data scientists can develop and test programs on moderately sized datasets on local machines and then immediately scale up to larger datasets deployed on many nodes in the cloud or on a supercomputer without any code modifications.
Getting Started on Github
The NVIDIA cuNumeric library on library:
- Provides seamless, drop-in NumPy replacement.
- Transparently accelerates and scales existing NumPy workflows onto multiple nodes across CPUs and GPUs.
- Achieves excellent scaling of compute-intensive applications up to thousands of GPUs.
- Requires zero code changes to ensure developer productivity.
- Is freely available through GitHub or Conda .
Weak Scaling of Richard-Lucy Deconvolution on DGX SuperPOD
Processing 10TB Microscopy Image Data as a Single NumPy Array
This multi-view lattice light-sheet microscopy example produces tens of terabytes (TB) of raw image data per day. Up until now, all processing has happened offline, after all the data has been collected. By moving all the preprocessing and reconstruction operations to GPUs and using cuNumeric on Legate, the data can be visualized in real time as it’s processed.
Get Started with cuNumeric today.