Legate

Legate is a framework and runtime with an ecosystem of libraries that democratize distributed accelerated computing for everyone. It’s designed to enable all users, including researchers, scientists, data engineers, and AI innovators, to scale their applications or scientific projects effortlessly without having to deal with the complexity of Parallel and Distributed computing. With Legate and Legate libraries, users can write sequential programs on their laptop and execute that very same code that runs on a CPU to run on a workstations with GPUs, GPU instances in public clouds, or multi-GPU/multi-node supercomputers. Using this technology, AI/ML innovators, scientific computing, and data scientists can develop and test programs on moderately sized datasets on local machines. They can then immediately scale up to larger datasets deployed on many nodes in the cloud, or on a supercomputer, without any code modifications.


Benefits

Productivity

Heighten research and developer efficiency by utilizing implicit parallelism effortlessly, eliminating the learning curve of GPU programming, parallel processing, and distributed computing.

Scalability

Scale from a single laptop to a distributed parallel architecture at data center scale with thousands of nodes and tens of thousands of CPUs and GPUs.

Flexibility

Transparently extend popular tools like NumPy, JAX, and RAPIDS™ on a common foundation that remains compatible and composable. Enables third parties to build libraries on the common core.


Key Features

Implicit Parallelism

Convert simple sequential order programs into parallel tasks implicitly.

cuPyNumeric provides three levels of implicit parallelism:

  1. Asynchronous execution of independent computations (tasks).

  2. Tasks that can be broken into sub-tasks and executed by different processes.

  3. Kernel-level parallelism using CUDA kernels.

Unified Data Abstraction

Legate framework offers a unified data abstraction for Legate libraries.

Legate runtime manages the data partitioning, distribution, and synchronization; providing a strong guarantee of data coherency.

Different libraries are designed to share in-memory buffers of data without unnecessary copying.

Composable Libraries

Use cuPyNumeric and Legate Sparse for scientific computing.

Legate IO for distributed high performance throughput I/O.

Legate Rapids for data scientists and data engineers.

Legate backend for JAX for AI/ML and large language models (LLMs).

All libraries share a unified and coherent data representation.

Friendly User Interface 

A single command to start a distributed launch in supercomputers. Run within Jupyter Notebooks and K8s for cloud-native infrastructure. A DASK bootstrap is available for running on Dask clusters.

Flexible Infrastructure

CPU and GPU, x86, and ARM.
Workstation, supercomputer, and public clouds.

Efficient Communication

GPU Direct for communication.
NVLink, InfiniBand, Slingshot, EFA. Automatically choose the fastest datapath.

Profiling & Debugging

Dataflow and event graphs during each compile run.

Timelines of program’s execution.
Nsight System support.

SDK to Extend

Legate provides C++ and Python APIs that developers can use to build composable libraries on Legate.


Product Overview

As datasets and applications continue to increase in size and complexity, there’s a growing need to harness computational resources, far beyond what a single CPU-only system can provide. Legate provides an abstraction layer and a runtime system that together enable scalable implementations of popular domain-specific APIs. It offers an API similar to Apache Arrow but guarantees stronger data coherence and synchronization to aid library developers.

Legate enables major use cases, including AI/ML, scientific computing, and advanced data analytics to run efficiently in a large-scale, multi-GPU/multi-node infrastructure.

Legate distributed framework and runtime for accelerated computing

Scientific Computing

Legate provides augmented alternatives to popular scientific computing libraries like NumPy, SciPy, HDF5, and others, with the addition of multi-GPU multi-node support seamlessly. You can use familiar language tools to write research programs and let Legate scale it from CPU to GPU, to multi-GPU multi-Node, without changing any code.

Data Science & Analytics

Legate Dataframe helps data scientists scale their applications to work with larger and text-heavy datasets in demanding workloads, enhancing pre-existing CPU and GPU-accelerated data science and AI libraries.

AI/ML (coming soon…)

NVIDIA’s Legate backend for JAX supports popular and future models. Whether you’re building on LLMs like Llama 3, MoE, or new models, Legate backend for JAX enables flexible parallelisms for AI/DL training to achieve state-of-the-art performance with thousands of GPUs.


GTC Videos on Demand

Watch video about Legate: A Productive Programming Framework for Composable, Scalable, Accelerated Libraries

Legate: A Productive Programming Framework for Composable, Scalable, Accelerated Libraries

GTC24 Talk Session

Watch video about how to create a distributed GPU-accelerated Python library

How to Create a Distributed GPU-Accelerated Python Library

GTC23 Talk Session

Watch video about Legate: Scaling the Python Ecosystem

Legate: Scaling the Python Ecosystem

GTC21 Talk Session

Get Started in Legate

Legate Core Documentation

Full technical documentation with details about the Legate architecture, API calls, and developer tools can be found here.

Legate on GitHub

Legate software and documentation can be found in this GitHub repository. Check periodically for the latest updates and news.

cuPyNumeric Library

Learn more about the cuPyNumeric library for Python developers, providing scalable performance on multi-GPU/multi-node configurations.

Resources