Today, Polars released a new GPU engine powered by RAPIDS cuDF that accelerates Polars workflows up to 13x on NVIDIA GPUs, allowing data scientists to process hundreds of millions of rows of data in seconds on a single machine.
Growing data challenges
Traditional data processing libraries like pandas are single-threaded and become impractical to use beyond a few million rows of data. Distributed data processing systems can handle billions of rows but add complexity and overhead for processing small-to-medium size datasets.
There has been a gap in tools that process data efficiently for tens of millions up to a few hundred million rows of data. Such workloads are common for model development, demand forecasting, and logistics in industries like finance, retail, and manufacturing.
Polars is one of the fastest growing Python libraries for data scientists and engineers, and was designed from the ground up to address these challenges. It uses advanced query optimizations to reduce unnecessary data movement and processing, allowing data scientists to smoothly handle workloads of hundreds of millions of rows in scale on a single machine. Polars bridges the gap where single-threaded solutions are too slow, and distributed systems add unnecessary complexity, offering an appealing “medium-scale” data processing solution.
Bringing NVIDIA accelerated computing to Polars
Polars leverages multi-threaded execution, advanced memory optimizations, and lazy evaluation to deliver significant acceleration out of the box compared to other CPU-only data manipulation tools.
However, as organizations across industries face growing data processing demands – from analyzing billions of financial transactions to managing complex inventory systems – even higher performance is required. This is where accelerated computing comes into play:
cuDF is part of the NVIDIA RAPIDS suite of CUDA-X libraries. It’s a GPU-accelerated DataFrame library that harnesses the massive parallelism of GPUs to significantly enhance data processing performance.
The Polars team partnered with NVIDIA to add the speed of cuDF to the efficiency of Polars for an additional performance boost, up to 13x compared to Polars on CPU. This allows users to maintain an interactive experience as their data processing workloads grow to hundreds of millions and even billions of rows of data.
Built directly into the Polars Lazy API, users can access GPU acceleration for their workflows by simply installing polars[gpu]
via pip and passing [engine=”gpu”]
to the collect operation. Under the hood, PoIars will attempt to execute operations on the GPU first and fall back to the CPU if necessary. This approach ensures:
- Efficient execution and minimal memory usage by using Polars’ query optimizer
- Users can access the GPU engine with zero changes to existing Polars code
- Full compatibility with Polars’ growing ecosystem of data visualization, I/O, and machine learning libraries
pip install polars[gpu] --extra-index-url=https://pypi.nvidia.com
import polars as pl
(transactions
.group_by("CUST_ID")
.agg(pl.col("AMOUNT").sum())
.sort(by="AMOUNT", descending=True)
.head()
.collect(engine="gpu"))
Conclusion
The Polars GPU engine powered by RAPIDS cuDF is now available in open beta, offering data scientists and engineers in every industry a powerful tool for medium-scale data processing. It accelerates Polars workflows up to 13x on NVIDIA GPUs, efficiently handling datasets of hundreds of millions of rows without the overhead of distributed systems. The Polars GPU engine is built directly into the Polars API, making it easily accessible to every user.
Getting Started with the Polars GPU Engine
Check out these resources to learn more and get started with the Polars GPU engine:
- Introductory notebook available on GitHub and Colab
- Polars Release Blog
- Polars User Guide