Data Science

RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes

Decorative image of a computer screen against a purple background, with a dial on the side.

NVIDIA announced that RAPIDS cuDF can now bring GPU acceleration to 9.5M million pandas users without requiring them to change their code.

Watch the keynote replay from the recent AI and Data Science Virtual Summit hosted by NVIDIA.

pandas, a flexible and powerful data analysis and manipulation library for Python, is a top choice for data scientists because of its easy-to-use API. However, as dataset sizes grow, it struggles with processing speed and efficiency in CPU-only systems.

RAPIDS is an open-source suite of GPU-accelerated Python libraries designed to improve data science and analytics pipelines. RAPIDS cuDF is a GPU DataFrame library that provides a pandas-like API for loading, filtering, and manipulating data. In earlier releases of cuDF, it was meant for GPU-only development workflows.

With the latest release of RAPIDS v23.10, cuDF now brings accelerated computing to pandas workflows with no code changes through a unified CPU/GPU user experience with its new pandas accelerator mode. It’s available today in the open-source RAPIDS v23.10 release as an open beta and will be supported in NVIDIA AI Enterprise soon.

Video 1. Accelerate Pandas by Nearly 150X with RAPIDS cuDF

In the video, you can see identical pandas workflows running side-by-side: one uses pandas with CPU-only and the other uses pandas accelerator mode in RAPIDS cuDF.

Bringing a unified CPU/GPU experience to pandas workflows

cuDF has always provided users with top DataFrame library performance using a pandas-like API. However, adopting cuDF has sometimes required workarounds:

  • Working around any pandas functionality not yet implemented or supported in cuDF.
  • Designing separate code paths for CPU and GPU execution in codebases that require running on heterogeneous hardware.
  • Manually switching between cuDF and pandas when interacting with other PyData libraries or organization-specific tooling designed for pandas.

Starting with the RAPIDS v23.10 release, cuDF now provides a pandas accelerator mode to address these challenges, in addition to the existing GPU-only experience.

This feature was built for data scientists who want to continue using pandas as data sizes grow into the gigabytes and pandas performance slows. In cuDF’s pandas accelerator mode, operations execute on the GPU where possible and on the CPU (using pandas) otherwise, synchronizing under the hood as needed. This enables a unified CPU/GPU experience that brings best-in-class performance to your pandas workflows. 

With the latest release, cuDF now provides the following features:

  • Zero code change acceleration: Just load the cuDF Jupyter Notebook extension or use the cuDF Python module option.
  • Third-party library compatibility: pandas accelerator mode is compatible with most third-party libraries that operate on pandas objects. It will even accelerate pandas operations within these libraries.
  • Unified CPU/GPU workflows: Develop, test, and run in production with a single code path, regardless of hardware.

To bring GPU acceleration into your pandas workflows in a Jupyter notebook, load the cudf.pandas extension:

%load_ext cudf.pandas
import pandas as pd

To access it when running Python scripts, use the cudf.pandas module option:

python -m cudf.pandas

Bringing top performance to pandas workflows

As data sizes scale into the gigabytes, using pandas often becomes challenging due to slower performance, causing some data scientists to grudgingly give up the pandas API they love. With the new RAPIDS cuDF, you can keep using pandas as your primary tool and access the highest performance.

You can see this in action by running the pandas portion of the popular DuckDB Database-like Ops Benchmark originally developed by DuckDB’s benchmark setup compares popular CPU-based DataFrame and SQL engines on a series of common analytics tasks such as joining data together or computing statistical measures on a per-group basis.

With 5 GB of data, pandas performance slows to a crawl, taking minutes to perform the series of join and advanced groupby operations.

Historically, running this benchmark with cuDF rather than pandas has required changing the code and working around missing functionality. With cuDF’s new pandas accelerator mode, this is no longer an issue. You can run the pandas benchmark code unchanged and achieve significant speedups, using the GPU for most of the operations and the CPU for a small portion to ensure that the workflow succeeds.

The results are excellent. The cuDF unified CPU/GPU experience turns minutes of processing into just 1 or 2 seconds with no code change required (Figure 1).

Bar chart shows a 150x speed increase using pandas with RAPIDS cuDF on NVIDIA Grace Hopper. The 10 join operations take only 1 second with RAPIDS cuDF as opposed to 336 seconds with pandas on CPU, while the 10 groupby advanced operations take 2 seconds with RAPIDS cuDF compared to 288 seconds with pandas on CPU.
Figure 1. Performance comparison between Traditional pandas v1.5 on Intel Xeon Platinum 8480CL CPU and pandas v1.5 with RAPIDS cuDF on NVIDIA Grace Hopper

For more information about these benchmark results and how to reproduce them, see the cuDF documentation.


pandas is the most popular DataFrame library in the Python ecosystem, but it slows down as data sizes grow on CPUs. 

With cuDF’s pandas accelerator mode now available in open beta as part of the RAPIDS v23.10 release, you can now bring accelerated computing to your pandas workflows without needing to change your code. Based on an analytics benchmark processing a 5 GB dataset, you can achieve 150x faster processing times.

Take cuDF’s new pandas accelerator mode for a test drive with this detailed walkthrough notebook in a free GPU-enabled environment on Google Colab.

For more information, see the cuDF pandas accelerator mode page on the RAPIDS website.

Discuss (4)