The RAPIDS v24.10 release takes another step forward in bringing accelerated computing to data scientists and developers with a seamless user experience. This blog post highlights the new features including:
- Zero code change accelerated NetworkX is now generally available (GA)
- Polars GPU engine in open beta
- Bringing UMAP to larger-than-GPU-memory datasets
- Improved cuDF pandas compatibility with NumPy and PyArrow
- Guidelines for incorporating GPUs into GitHub-based CI systems
- RAPIDS-wide support for Python 3.12 and and NumPy 2.x
Zero code change accelerated NetworkX
NetworkX accelerated by RAPIDS cuGraph is now GA in the 24.10 release beginning with NetworkX 3.4. This release adds GPU-accelerated graph creation, a new user experience, and expanded documentation.
Accelerated graph construction enables full end-to-end acceleration for NetworkX workflows, which is particularly valuable for workflows with large graphs where conversion between CPU and GPU can reduce performance.
The full end-to-end accelerated NetworkX experience is now enabled by setting the NX_CUGRAPH_AUTOCONFIG
environment variable to True.
%env NX_CURGAPH_AUTOCONFIG=True
import pandas as pd
import networkx as nx
url = "https://data.rapids.ai/cugraph/datasets/cit-Patents.csv"
df = pd.read_csv(url, sep=" ", names=["src", "dst"], dtype="int32")
G = nx.from_pandas_edgelist(df, source="src", target="dst")
%time result = nx.betweenness_centrality(G, k=10)
End-to-end acceleration enables workflows using algorithms like betweenness centrality, PageRank, and more to experience speedups of up to 10, 50, or even 500x for some algorithms on larger graphs.
You can learn more about NetworkX accelerated by cuGraph in the documentation and explore the code for the benchmarks above here.
Zero code change accelerated Polars in open beta
In September, the Polars GPU engine powered by cuDF was released in open beta. With GPU support available in Polars, users can access up to 13x faster workflows compared to running on CPUs with zero code change required.
Built directly into the Polars Lazy API, users can configure Polars to use the GPU with the `engine
` keyword to `collect
` when they trigger computation.
import polars as pl
df = pl.LazyFrame({"a": [1.242, 1.535]})
q = df.select(pl.col("a").round(1))
result = q.collect(engine="gpu")
To learn more, read the NVIDIA and Polars announcement blogs or dive into the Polars GPU Support documentation. Or, jump right into a Google Colab notebook and take it for a test drive.
Bringing UMAP to larger-than-GPU-memory datasets
Beginning in v24.10, cuML’s UMAP algorithm now supports processing larger-than-GPU-memory datasets that would have resulted in an Out Of Memory error in earlier releases. By using a novel batched approximate nearest neighbor algorithm and optionally storing the full dataset in CPU memory, we’re able to build the approximate KNN graph while only processing subsets of the data on the GPU at any given time.
Users can tap into this new optional functionality by setting the new `nnd_n_clusters
` keyword to any value greater than 1 (the default) and (if necessary) passing `data_on_host=True
` keyword to `fit
` or `fit_transform
`.
from cuml.manifold import UMAP
import numpy as np
# Generate synthetic data using numpy (random float32 matrix)
X = np.random.rand(n_samples, n_features).astype(np.float32)
# UMAP parameters
num_clusters = 4 # Number of clusters for NN Descent batching, 1 means no clustering
data_on_host = True # Whether the data is stored on the host (CPU)
# UMAP model configuration
reducer = UMAP(
n_neighbors=10,
min_dist=0.01,
build_algo="nn_descent",
build_kwds={"nnd_n_clusters": num_clusters},
)
# Fit and transform the data
embeddings = reducer.fit_transform(X, data_on_host=data_on_host)
Users can start with an initial value of n_clusters (e.g., 4) and increase it as needed to manage GPU memory usage. Setting the value too high may lead to performance overhead due to multiple iterations of graph building, so it may be beneficial to find a balance based on the size of the dataset and GPU memory available.
Improved cuDF pandas ecosystem compatibility
Improved code compatibility
cuDF’s pandas accelerator mode is now fully compatible with NumPy arrays. Previously, running Python isinstance
checks on NumPy arrays produced by the pandas API would return False when using cuDF pandas but True when using standard pandas. As this is a common code design pattern, some user workflows require workarounds to run smoothly.
Starting in v24.10, cudf.pandas now functionally produces true NumPy arrays when the accelerator mode is active and a user tries to convert the DataFrame or column to an array — eliminating this issue. For example:
%load_ext cudf.pandas
import pandas as pd
import numpy as np
arr = pd.Series([1, 2, 3]).values # now returns a true numpy array
isinstance(arr, np.ndarray) # returns True
This change also enables code relying on the NumPy C API to work smoothly with cuDF pandas.
Improved arrow compatibility
cuDF also now supports a range of PyArrow versions. Arrow compatibility has been a long-running pain point for cuDF users. Every release of cuDF until now had been tied to a very specific release of Arrow due to our usage of the Arrow C++ API and the binary compatibility requirements that usage imposed.
With this release, we’ve rewritten those features to exclusively use the Arrow C Data Interface, which in turn has allowed us to stop using Arrow C++ entirely. With that change, cuDF Python can now support any PyArrow version since PyArrow 14.
Guidelines for incorporating GPUs into GitHub-based CI systems
We’ve heard from the community that it can be challenging to figure out a simple and effective way to incorporate GPUs into GitHub based CI systems. New guidelines for doing this effectively was added to the RAPIDS Deployment documentation, based on the scikit-learn team’s experience.
GitHub Actions now has support for hosted GPU runners. This means that any project on GitHub can leverage NVIDIA GPUs in their CI workloads for testing. This makes it much easier for projects to integrate with RAPIDS libraries and test that changes are compatible without needing GPU hardware locally.
GPU Hosted runners are not included in the GitHub Action free-tier. Runners with GPUs typically cost a few cents per minute and projects can add a monthly spending cap to help keep costs under control.
To set up a GPU runner, navigate to the GitHub Actions section of your organization’s settings and add a new runner. Then select the NVIDIA Partner Image and give your runner a GPU by changing the Size to a GPU-powered VM.
Then you can configure your workflows to use your new runners with the runs-on
option.
name: GitHub Actions GPU Demo
run-name: ${{ github.actor }} is testing out GPU GitHub Actions
on: [push]
jobs:
gpu-workflow:
runs-on: linux-nvidia-gpu
steps:
- name: Check GPU is available
run: nvidia-smi
For more detailed information on setting up GitHub Actions GPU powered workflows check out the RAPIDS Deployment documentation, which also includes best practices on when to run your GPU CI to get the best bang for your buck.
The scikit-learn project recently set up GPU runners on GitHub Actions, using labels to manually trigger a GPU workflow on select PRs. Check out their blog post to learn about their experience.
RAPIDS platform updates
In 24.10, RAPIDS packages picked up some important updates allowing them to be used alongside newer versions of other scientific computing software. The packages now support Python 3.10-3.12 and NumPy 1.x and 2.x. They also now support fmt 11 and spdlog 1.14, the versions of those libraries now used across most of conda-forge. As part of these enhancements, this release also drops support for Python 3.9 or NCCL older than 2.19.
Conclusion
The RAPIDS 24.10 release takes another step forward in our mission to make accelerated computing more accessible to data scientists and engineers. We can’t wait to see what people do with these new capabilities.
If you’re new to RAPIDS, check out these resources to get started.