NVIDIA cuML: GPU-Accelerated Machine Learning
NVIDIA cuML is an open-source CUDA-X™ Data Science library that accelerates scikit-learn, UMAP, and HDBSCAN on GPUs—supercharging machine learning workflows with no code changes required.
Key Features
Maximizes Performance on NVIDIA GPUs
cuML optimizes fundamental machine learning operations for execution on GPUs. This significantly speeds up model development and training times with quicker testing and parameter-tuning iterations.
Zero-Code-Change Acceleration
cuML includes an API (cuml.accel) that can run your existing scikit-learn, UMAP, or HDBSCAN code on GPUs with no code modifications.
CPU Fallback
cuML’s zero-code-change API (cuml.accel) ensures your scikit-learn, UMAP, and HDBSCAN code won’t fail to execute by automatically pushing code to the GPU or CPU based on coverage. Read more in How It Works.
Flexibility
cuML includes two interfaces: a zero-code-change API for popular machine learning algorithms and a Python GPU-only machine learning library similar to scikit-learn with comprehensive coverage. Learn more in the docs.
Scalability
cuML efficiently utilizes single-GPU systems to process large datasets that overwhelm CPU-based implementations of core machine learning libraries.
Distributed Computing
cuML accelerates distributed machine learning applications at scale, with real-world examples of up to 6 TB datasets on multi-node-multi-GPU clusters via the popular Apache Spark MLlib API.
Turn cuML On to Accelerate scikit-learn by 50x
NVIDIA cuML runs popular machine learning algorithms like scikit-learn Random Forest, UMAP, and HDBSCAN on GPUs with zero code changes.
Test Drive cuML
Intro Blog: cuML Accelerator
NVIDIA cuML brings zero-code-change GPU acceleration with massive speedups to scikit-learn, UMAP, and HDBSCAN.
Colab Quickstart: Hands-On cuML Tutorial
cuML comes preinstalled in Google Colab, making it incredibly easy to get started. Simply switch to a GPU runtime and use this notebook to try cuml.accel for scikit-learn, UMAP, and HDBSCAN.
Install cuML
To get started, install cuML using the code snippets below.
Quick Install With conda
1. If not installed, download and run the install script. This will install the latest miniforge:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh
2. Then install with:
conda create -n rapids-25.06 -c rapidsai -c conda-forge -c nvidia \ cuml=25.06 python=3.13 'cuda-version>=12.0,<=12.9'
Quick Install With pip
Install via the NVIDIA PyPI index: pip install \ --extra-index-url=https://pypi.nvidia.com \ cudf-cu12==25.6.* \ cuml-cu12==25.6.* \
See the complete install selector for docker, WSL2, and individual libraries.
Enable cuML Acceleration of scikit-learn, UMAP, and HDBSCAN With Zero Code Changes
Once cuML is installed, you can access and enable the cuml.accel module to accelerate scikit-learn, UMAP, and HDBSCAN workflows with no code changes. Note that not all cuML estimators are supported in cuml.accel (open beta) today. Read about the known limitations to understand what is and what is not covered.
To use cuml.accel, enable it using one of these methods before importing scikit-learn, UMAP, or HDBSCAN:
To accelerate IPython or Jupyter notebooks, use the magic command:
%load_ext cuml.accel import sklearn ...
To accelerate a Python script, use the Python module flag on the command line:
python -m cuml.accel script.py
If you can't use command-line flags, explicitly enable cudf.pandas via import:
import cuml.accel cuml.accel.install() import sklearn ...
50x Faster scikit-learn
Speedup of average training performance for traditional machine learning algorithms running on cuml.accel and scikit-learn code on GPU vs scikit-learn on CPU.
Specs: NVIDIA cuML 25.02 on NVIDIA H100 80GB HBM3, scikit-learn v1.5.2 on Intel Xeon Platinum 8480CL
60x Faster UMAP, 175x Faster HDBSCAN
Speedup of average training performance for traditional machine learning algorithms running on cuml.accel and UMAP/HDBSCAN code on GPU vs UMAP/HDBSCAN on CPU.
Specs: NVIDIA cuML 25.02 on NVIDIA H100 80GB HBM3, umap-learn v0.5.7, hdbscan v0.8.40 on Intel Xeon Platinum 8480CL
Hands-On Tutorials: Accelerate scikit-learn, UMAP, and HDBSCAN
Dive into these resources to accelerate your machine learning workflows with cuML, including hands-on examples of advanced ML techniques, specialized applications, and deployment optimizations.
Starter Kit: Accelerate Topic Modeling
This kit demonstrates how to significantly improve performance for topic modeling by minimizing noise clusters and leveraging a rewards-guided, GPU-accelerated method with BERTopic and cuml.accel.
Starter Kit: Stacking Using cuML
This kit shows how to achieve high-performance stacking by using the speed of GPUs to efficiently train and combine numerous diverse models, maximizing accuracy in complex tabular data challenges.
Starter Kit: Accelerate Single-Cell Genomics
This kit demonstrates techniques to measure and analyze single-cell data at scale, accelerating analysis cycles and saving significant time by leveraging GPUs for genomics workflows.
Accelerate Time-Series Forecasting
This blog demonstrates how cuML accelerates time-series forecasting, enabling you to work with larger datasets and forecast windows using skforecast for faster iteration.
Accelerate UMAP Dimensionality Reduction
This blog demonstrates how cuML dramatically speeds up UMAP workflows, transforming processing times from days to hours. Learn how GPU acceleration simplifies large-scale dimensionality reduction.
Supercharge Tree Model Inference With FIL
This blog highlights how Forest Inference Library (FIL) delivers blazing-fast inference for tree models within cuML. Explore new capabilities, performance gains, and features to optimize your model deployment.
How cuML Accelerates scikit-learn, UMAP, and HDBSCAN
cuML introduced zero-code-change acceleration in open beta with the cuml.accel module. When you load this module, importing scikit-learn, umap-learn, or hdbscan allows cuML to "intercept" estimators from these CPU modules. This makes all scikit-learn estimators a proxy to either a GPU or a CPU estimator at any given time.
When you use an estimator, your code will use cuML’s GPU-accelerated implementation under the hood if it can. If it can’t, it will fall back to standard CPU scikit-learn. This works in reverse as well. If you’ve already trained a model on the GPU and a particular method isn’t supported, cuML will reconstruct the trained model on the CPU and use the scikit-learn version.
Read more about the rapidly growing list of algorithms and parameters that the zero-code-change interface covers.
cuML also provides an API that mirrors scikit-learn, supports a much wider set of algorithms, and is suitable for users looking to maximize performance for their bespoke applications. Read about it in cuML's documentation.
Data Science Training From NVIDIA
Join the Community
Ethical AI
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.