CUDA-X Data Science

CUDA-X™ Data Science is a collection of open-source libraries that accelerate popular data science libraries and platforms. It is part of the CUDA-X collection of highly optimized, domain-specific libraries built on CUDA®.

CUDA-X Data Science includes zero code change APIs to accelerate popular PyData tools like pandas, scikit-learn, as well as distributed computing frameworks like Apache Spark. With 100+ integrations with open-source libraries and tools in the data science and data processing ecosystem, CUDA-X Data Science democratizes access to accelerated data science.

Download NowDocumentation

 NVIDIA CUDA-X Data Science open-source libraries

CUDA-X Data Science Libraries

Accelerate data analytics, machine learning, graphs as well as data intensive applications such as vector search to get the highest performance possible on single GPUs or scale up to distributed systems using simple zero code change interfaces.

cuDF: 50x Faster pandas

cuDF is a GPU-accelerated library that optimizes fundamental DataFrame operations. It includes drop-in accelerators for popular DataFrame tools like pandas, Polars, and Apache Spark with no code changes required.

Run the Benchmark
View Docs
Install Now

TAGS: pandas, dataframe, Python,Cc++

cuML: 50x Faster scikit-learn

cuML is a GPU-accelerated machine learning library that optimizes machine learning algorithms for execution on GPUs. It includes accelerators that run machine learning algorithms in scikit-learn, UMAP, and HDBSCAN with no code changes required. 

Run the Benchmark
View Docs
Install Now

TAGS: scikit-learn, machine learning, Python, C++

cuGraph: 48x Faster NetworkX

cuGraph is a GPU-accelerated graph analytics library that optimizes graph algorithms for execution on GPUs to process millions of nodes without specialized software. It includes a zero-code-change accelerator for NetworkX. 

Run the Benchmark
View Docs
Install Now

TAGS: NetworkX, graph, Python, C++

Apache Spark Accelerated with cuDF

Learn more about our accelerator plug-in for Apache Spark workflows.

TAGS: machine learning, data processing, distributed computing, Scala, Python

Dask-RAPIDS

Scale out GPU-accelerated data science pipelines to multiple nodes on Dask.

Tags: distributed computing, Python

cuxfilter

Create interactive data visuals with multidimensional filtering of over 100-million-row tabular datasets.

Tags: dashboards, visualization, Python

cuCIM

Mirror scikit-image for image manipulation and OpenSlide for image loading with the cuCIM API.

Tags: computer vision, vision processing, Python

cuVS

Apply cuVS algorithms to accelerate vector search, including world-class performance from CAGRA.

TAGS: vector search, Python, C++, c, rust

RAFT

Use RAFT’s CUDA-accelerated primitives to rapidly compose analytics.

Tags: primitives, algorithms, CUDA, Python, C++

KvikIO

Take full advantage of NVIDIA® GPUDirect® Storage (GDS) through powerful bindings to cuFile.

Tags: FILEIO, GPUDirectStorage, Python, C++

Other CUDA-X Data Science and Processing Libraries

See a complete list of libraries and tools.


Get Started

Starter Kit: Accelerated Data Analytics With pandas Code

This kit demonstrates how to create responsive dashboards on large-scale data using pandas code and PyViz libraries, leveraging cuDF for accelerated exploratory data analytics with zero code changes.

Starter Kit: Accelerated Machine Learning on XGBoost

XGBoost is the most popular Python library for gradient boosted decision trees. It supercharges machine learning models for classification, regression and ranking workflows.

Starter kit: Accelerated Machine Learning With cuML Code

cuML accelerates popular machine learning algorithms, including Random Forest, UMAP, and HDBSCAN

Starter Kit: Accelerated Data Analytics With Apache Spark

The NVIDIA RAPIDS™ accelerator for Apache Spark accelerates enterprise-level data workloads to drive cost savings.

Starter Kit: Accelerated Data Analytics With Polars Code

Polars is known for its high performance and memory optimizations. Experience even faster execution when you call the GPU engine powered by cuDF.

Starter Kit: Accelerated Graph Analytics With NetworkX Code

NetworkX accelerates popular graph algorithms, including Louvain, Betweeness Centrality, and PageRank.


Install and Deploy in Your Environment

Quick Install With conda

1. If not installed, download and run the install script. This will install the latest miniforge:

Bash
wget
"https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

2. Then install with:

Bash
conda create -n rapids-25.08 -c rapidsai -c conda-forge -c nvidia rapids=25.08 python=3.13 cuda-version=12.9

Quick Install With pip

Bash
Install via the NVIDIA PyPI index:
pip install \
--extra-index-url=https://pypi.nvidia.com \
cudf-cu12==25.8.* \
dask-cudf-cu12==25.8.* \
cuml-cu12==25.8.* \
cuGraph-cu12==25.8.*

The Accelerated Data Science Ecosystem

Data practitioners in open-source libraries, commercial software, and industries are driving innovation with CUDA-X Data Science.

Data Science Open-Source Library - Apache Arrow
Data Science Open-Source Library - Apache Spark
Data Science Open-Source Library - CuPy
 Data Science Open-Source Library - Dask
Data Science Open-Source Library - Dmlc XGBoost
 Data Science Open-Source Library - HoloViz
Data Science Open-Source Library - NetworkX
Data Science Open-Source Library - Numba
 Data Science Open-Source Library - Polars
Data Science Open-Source Library - PyG
Data Science Open-Source Library - PyTorch
Data Science Open-Source Library - Scikit Learn
 Data Science Open-Source Library - scverse

Join the Community

Join the Accelerated Data Science Community on Slack

Sign Up for the Data Science Newsletter


Ethical AI

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Download CUDA-X Data Science libraries today.

Download