Building an Accelerated Data Science Ecosystem: RAPIDS Hits Two Years

GTC Fall 2020 marked the second anniversary of the initia l release of RAPIDS. Created out of the GPU Open Analytics Initiative (GoAi) aimed at making accelerated, end-to-end analytics on GPUs easy, RAPIDS has proven GPUs are performant, easy to use, and transformative to the future of data analytics. By thinking about the relationship between software and hardware holistically, the contributors behind the RAPIDS ecosystem are making the vast potential of accelerated computing accessible to data practitioners across industries and institutions.

The graphic shows the RAPIDS GPU speedup over CPU, a brief list of customer use cases, and the logos of companies using RAPIDS in their own data science products and solutions, including AWS, Google Cloud Platform, Microsoft Azure, and Oracle. — *Figure 1. NVIDIA RAPIDS, an open-source software accelerator for end-to-end data science workloads on GPUs, has grown substantially in the last two years. Source:* *GTC 2020 Fall keynote*.

Shipped with a subset of the current core libraries, RAPIDS’ initial release scratched the surface of the impact that accelerated computing could have on data science. Fast forward two years and RAPIDS 0.16 reflects the maturity and growth of those core libraries, the many use cases enabled by RAPIDS, and the enterprises, institutions, and people using accelerated computing to power their data science. Between RAPIDS 0.1 and 0.16, we’ve seen the continued acceleration of the entire data science lifecycle and the expansion of the RAPIDS ecosystem, with more and more software and products adopting accelerated computing. These last two years have bridged the gap between data practitioners and high-performance computing (HPC), enabling the application of sophisticated data science to products and services to create more data-driven experiences for users and customers.

Constantly growing performance and applicability

When RAPIDS was originally released, the initial speed-ups were massive. Compared to similar CPU-based implementations, RAPIDS was delivering nearly 50x performance improvements using NVIDIA GPUs for classical machine learning (ML) processes.

As RAPIDS has matured, speedups have continued to improve, drastically reducing the total cost of ownership (TCO) for large data science operations. Running an industry-standard big data benchmark at 10-TB scale, RAPIDS performs at up to 20x faster on GPUs than the top CPU baseline. Using just 16 NVIDIA DGX A100 systems to achieve the performance of 350 CPU-based servers, the NVIDIA solution is 7x more cost-effective while delivering HPC-level performance.

By focusing on whole-stack innovation, RAPIDS continues to improve as NVIDIA hardware and software advance. RAPIDS supports CUDA 11 and Ampere GPUs across all libraries, putting the power of the latest GPUs in data practitioners’ hands.

RAPIDS also continues to expand its applicability to a wide variety of use cases. The focus of the first RAPIDS release was basic data ingest, data processing, ETL, and common ML methods. In the latest RAPIDS release, I see continued maturity and growth. Libraries like cuSpatial and cuSignal are now tackling sophisticated use cases. RAPIDS also supports the deep learning (DL) community with the ability to easily pass data into other popular frameworks like PyTorch, TensorFlow, MXNet, and Chainer to enable further acceleration for DL use cases.

As the needs of the data science ecosystem grow, RAPIDS has grown with it. There are more use cases for streaming data processing, and cuStreamz was developed to provide a performant yet cost-effective solution to address these growing needs. We’re also seeing the rise of recommender systems to improve and personalize advertisements. NVTabular, a feature engineering and preprocessing library built upon cuDF, offers a robust solution to manipulate terabytes of tabular datasets quickly and easily. As the battle against COVID-19 rages on, streamlining drug discovery has become more important than ever. Leveraging RAPIDS, NVIDIA Clara helps accelerate the process to reduce the discovery time for lifesaving drugs and enables a wide variety of other critical healthcare use cases.

This graphic details the NVIDIA Clara framework aimed accelerating healthcare applications, and the modules that RAPIDS powers. — *Figure 2. NVIDIA Clara is a healthcare application framework for AI-powered imaging, genomics, and for the development and deployment of smart sensors. Source:* *GTC 2020 Fall keynote*.

Connecting more data practitioners to HPC

From the start, the RAPIDS community decided to follow the standards set by the thriving PyData ecosystem to ensure a minimal learning curve and a familiar user experience when moving to GPUs. By combining performance with accessibility, RAPIDS has put the power of HPC in the hands of practitioners across industries and organizations.

Many of the benchmarks of RAPIDS-based solutions have entered into the realm of HPC performance, whether conducted on supercomputers like Summit or in commercial cloud environments. Due to its accessibility and power, RAPIDS is beginning to provide an abstraction layer to traditional supercomputing systems, as seen at institutions like the National Energy Research Scientific Computing Center (NERSC) and Oak Ridge National Lab (ORNL).

At NERSC, experts are using Dask and RAPIDS to provide a familiar and intuitive user interface on top of their latest supercomputer “Perlmutter,” making its power easily accessible by researchers and scientists with limited background in supercomputing. ORNL uses Dask, RAPIDS, and BlazingSQL on its Summit supercomputer to screen small-molecule compounds that bind with the SARS-CoV-2 main protease. Using a Jupyter notebook on their laptops, data engineers were able to get this custom workflow up and running in less than two weeks and see subsecond query results.

Even without access to traditional supercomputers, we’re also beginning to see HPC-level performance in commercial cloud environments. Using Google Cloud Platform, we’re able to get incredible performance running a common ML model training pipeline using cuIO, cuDF, and XGBoost. Between loading and cleansing data, engineering features, and training a classifier using a 200GB CSV dataset, a RAPIDS-based pipeline completed its operations in just over two minutes on 16 NVIDIA DGX A100 systems. The same process took two hours and a half hours using 20 CPU nodes (61-GB memory, eight vCPUs, 64-bit platform). Figure 3 shows using RAPIDS on NVIDIA GPUs in Google’s commercial cloud environment, with 70x speedups against comparable CPU configurations.

This chart shows the massive performance gains that RAPIDS provides using GPUS, outperforming comparable CPU-based implementations by up to 70x. — *Figure 3. RAPIDS has continued to improve performance over the last two years. Most recently, RAPIDS delivered 70x speed-ups using NVIDIA A100 GPUs on GCP compared to CPU-based implementations.*

As more cloud providers onboard the latest GPU technologies, RAPIDS allows users to hone that power into HPC-powered operations. We’re beginning to see RAPIDS integrated into major cloud ML platforms, such as Amazon Sagemaker, Microsoft AzureML, and Google Cloud Dataproc, making RAPIDS-enabled solutions and power even more accessible. The recent GTC keynote announcement with Cloudera highlights a new initiative to accelerate the Cloudera Data Platform for enterprises using on-premises Hadoop and hybrid cloud deployments.

A thriving ecosystem and a growing market

RAPIDS is quickly becoming the backbone of the accelerated analytics ecosystem, integrating with popular PyData libraries and enterprise data science solutions. As the ecosystem continues to broaden, more industry leaders and companies are adopting RAPIDS to accelerate their solutions.

RAPIDS integrations

By focusing on creating bridges instead of forks, RAPIDS has organically integrated into popular data science tools, with many developments even being pushed upstream from RAPIDS.

Tightly knit with Dask, more and more users rely on Dask to horizontally scale their Python toolsets and RAPIDS to vertically scale for performance.
In its 1.0 release, XGBoost integrated RAPIDS functionality to provide native support for CUDA workers, accelerating model training operations for lightning-fast speeds.
We’re even beginning to see the expansion into use cases like AutoML through integrations with TPOT, making the application of ML more accessible and more feasible for businesses.
Expanding outside of data science, integrations between RAPIDS and Plotly Dash are allowing Pythonistas to easily visualize terabyte-scale datasets to empower data analysts and drive business decisions.

RAPIDS is also integrating with the Apache Spark ecosystem to accelerate Spark 3.0 data science pipelines.

RAPIDS adoption in enterprises

As the GPU-accelerated ecosystem continues to broaden and enable more business cases, we see more and more enterprises adopting RAPIDS to bolster their analytics operations.

With a large community of Python-friendly data scientists, Capital One uses Dask and RAPIDS to scale and accelerate traditionally hard to parallelize Python workloads and significantly lessen the learning curve for big data analytics.
Walmart Labs leverages Dask, XGBoost, and RAPIDS to reduce training times by 100X, enabling fast model iteration and accuracy improvements to better serve their customers.
Scotiabank is using XGBoost and RAPIDS to improve its credit risk scorecards to make the bank more profitable and help more people who deserve loans to get them.

RAPIDS-powered applications

With more enterprise adoption, there is a surge of software vendors popping up, providing solutions built upon RAPIDS.

Companies like Coiled and Saturn Cloud are providing managed Dask solutions for scaling Python analytics in the cloud, leveraging RAPIDS to enable GPU acceleration on their platforms.
BlazingSQL, built on Dask and cuDF, makes it easy to query large raw file formats such as CSV and Apache Parquet inside data lakes like HDFS and Amazon S3, and directly pipe the results into GPU memory.
AnswerRocket is also using RAPIDS and NVIDIA GPUs to propel their enterprise analytics platform to make it easier for businesses to get more out of their data.

What does this mean for the future of data analytics?

In two years, large-scale analytics has become feasible. Despite significant advances in traditional distributed computing systems, scaling out to more CPUs becomes cost-prohibitive, decreasing the return on investment for businesses seeking more data-driven decisions.

With RAPIDS, these issues diminish. More people can explore the formerly time-consuming and costly world of data analytics and reap the benefits. With this new feasibility, we’ll see more traditionally complex data analytics operations becoming prevalent in products and services. Faster model iteration, more frequent deployment, and lower costs means more effective products and millions added to the bottom line.

With the expansion of the RAPIDS-accelerated analytics ecosystem and more market offerings built upon RAPIDS, high-performance analytics becomes more accessible to users with varying backgrounds and expertise. Higher-level of abstraction on top of RAPIDS means that someone without a formal data science or ML background can build and operationalize sophisticated analytical processes, baking analytics into more and more customer-facing tools. As data-driven decision-making becomes more prevalent to customers and users, the scene is set for a truly AI-driven future.

Conclusion

Over the last two years, RAPIDS has gone from proof that GPUs can be impactful to data analytics to a thriving ecosystem of tools with a growing market. As we drive towards the impending 1.0 release, we anticipate that the incremental changes in RAPIDS will aggregate into industry impacts. The last two years have felt influentual and, hopefully, trailblazing for future innovations and trends that could change how we interact with technology.