RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes

At NVIDIA GTC 2024, it was announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code.

pandas, a flexible and powerful data analysis and manipulation library for Python, is a top choice for data scientists because of its easy-to-use API. However, as dataset sizes grow, it struggles with processing speed and efficiency in CPU-only systems.

RAPIDS is an open-source suite of GPU-accelerated Python libraries designed to improve data science and analytics pipelines. RAPIDS cuDF is a GPU DataFrame library that provides a pandas-like API for loading, filtering, and manipulating data. In earlier releases of cuDF, it was meant for GPU-only development workflows.

Last fall, RAPIDS released a version of cuDF that brought accelerated computing to pandas workflows with no code changes through a unified CPU/GPU user experience in open beta. At GTC 2024, NVIDIA announced that the cuDF acceleration of pandas is now generally available in the latest release of RAPIDS v24.02. This feature will be supported by NVIDIA AI Enterprise 5.0 later in the spring.

Video 1. Accelerate Pandas by Nearly 150X with RAPIDS cuDF

In the video, you can see identical pandas workflows running side by side in a Jupyter notebook: one uses pandas with CPU-only, one loads the cudf.pandas extension to accelerate pandas with RAPIDS cuDF.

Bringing a unified CPU/GPU experience to pandas workflows

cuDF has always provided users with top DataFrame library performance using a pandas-like API. However, adopting cuDF has sometimes required workarounds:

Working around any pandas functionality not yet implemented or supported in cuDF.
Designing separate code paths for CPU and GPU execution in codebases that require running on heterogeneous hardware.
Manually switching between cuDF and pandas when interacting with other PyData libraries or organization-specific tooling designed for pandas.

In the 24.02 release, cuDF accelerates pandas with zero code changes to address these challenges, in addition to the existing GPU-only experience.

This feature was built for data scientists who want to continue using pandas as data sizes grow into gigabytes and performance slows. When cuDF accelerates pandas, operations will run on the GPU if possible, and on the CPU (using pandas) otherwise. cuDF synchronizes between the GPU and CPU under the hood as needed. This enables a unified CPU/GPU experience bringing best-in-class performance to your pandas workflows.

With the GA release, cuDF provides the following features:

Zero code change acceleration: Just load the cuDF Jupyter Notebook extension or use the cuDF Python module option.
Third-party library compatibility: pandas accelerator mode is compatible with most third-party libraries that operate on pandas objects. It will even accelerate pandas operations within these libraries.
Unified CPU/GPU workflows: Develop, test, and run in production with a single code path, regardless of hardware.

To bring GPU acceleration into your pandas workflows in a Jupyter notebook, load the cudf.pandas extension:

%load_ext cudf.pandas
import pandas as pd

To access it when running Python scripts, use the cudf.pandas module option:

python -m cudf.pandas script.py

Bringing top performance to pandas workflows

As data sizes scale into the gigabytes, using pandas often becomes challenging due to slower performance, causing some data scientists to give up the pandas API they love. With the new RAPIDS cuDF, you can continue using pandas as your primary tool and access the highest performance.

You can see this in action by running the pandas portion of the popular DuckDB Database-like Ops Benchmark originally developed by H2o.ai. DuckDB’s benchmark setup compares popular CPU-based DataFrame and SQL engines on a series of common analytics tasks, such as joining data together or computing statistical measures per group.

With 5 GB of data, pandas performance slows to a crawl, taking minutes to perform the series of join and advanced group by operations.

Historically, running this benchmark with cuDF rather than pandas has required changing the code and working around missing functionality. With cuDF’s new pandas accelerator mode, this is no longer an issue. You can run the pandas benchmark code unchanged and achieve significant speedups, using the GPU for most of the operations and the CPU for a small portion to ensure that the workflow succeeds.

The results are excellent. The cuDF unified CPU/GPU experience turns minutes of processing into just 1 or 2 seconds with no code change required (Figure 1).

Bar chart shows a 150x speed increase using pandas with RAPIDS cuDF on NVIDIA Grace Hopper. — *Figure 1.* *Standard DuckDB Data Benchmark (5 GB)* *performance comparison between cuDF.pandas and traditional pandas v2.2*

HW: NVIDIA Grace Hopper, CPU: Intel Xeon Platinum 8480C | SW: pandas v2.2, RAPIDS cuDF 23.10

For more information about these benchmark results and how to reproduce them, see the cuDF documentation.

Conclusion

pandas is the most popular DataFrame library in the Python ecosystem, but it slows down as data sizes grow on CPUs.

With a single command, you can use cuDF to bring accelerated computing to your pandas workflows without changing your code. Based on an analytics benchmark processing a 5 GB dataset, you can achieve 150x faster processing times.

Take cuDF acceleration of pandas for a test drive with this detailed walkthrough notebook in a free GPU-enabled environment on Google Colab.

For more information, see the cuDF pandas page on the RAPIDS website.