GPU-Accelerated JSON Data Processing with RAPIDS

JSON is a widely adopted format for text-based information working interoperably between systems, most commonly in web applications. While the JSON format is human-readable, it is complex to process with data science and data engineering tools. To bridge that gap, RAPIDS cuDF provides a GPU-accelerated JSON reader (cudf.read_json) that is efficient and robust for many … Continued

Achieving 100x Faster Single-Cell Modality Prediction with NVIDIA RAPIDS cuML

Single-cell measurement technologies have advanced rapidly, revolutionizing the life sciences. We have scaled from measuring dozens to millions of cells and from one modality to multiple high dimensional modalities. The vast amounts of information at the level of individual cells present a great opportunity to train machine learning models to help us better understand the … Continued

Using the RAPIDS VM Image for Google Cloud Platform

NVIDIA’s Ty McKercher and Google’s Viacheslav Kovalevskyi and Gonzalo Gasca Meza jointly authored a post on using the new the RAPIDS VM Image for Google Cloud Platform. Following is a short summary. For the full post, please see the full Google article. If you’re a data scientist, researcher, engineer, or developer using pandas, Dask, scikit-learn, … Continued

Building an Accelerated Data Science Ecosystem: RAPIDS Hits Two Years

GTC Fall 2020 marked the second anniversary of the initial release of RAPIDS. Created out of the GPU Open Analytics Initiative (GoAi) aimed at making accelerated, end-to-end analytics on GPUs easy, RAPIDS has proven GPUs are performant, easy to use, and transformative to the future of data analytics. By thinking about the relationship between software … Continued

Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF

In this post, I introduce a design and implementation of a framework within RAPIDS cuDF that enables compiling Python user-defined functions (UDF) and inlining them into native CUDA kernels. This framework uses the Numba Python compiler and Jitify CUDA just-in-time (JIT) compilation library to provide cuDF users the flexibility of Python with the performance of … Continued

Accelerating Vector Search: Using GPU-Powered Indexes with RAPIDS RAFT

In the AI landscape of 2023, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic vector search enables a broad range of important tasks like detecting fraudulent transactions, recommending products to users, using contextual information to augment full-text searches, and finding actors that … Continued

Reusable Computational Patterns for Machine Learning and Information Retrieval with RAPIDS RAFT

RAPIDS is a suite of accelerated libraries for data science and machine learning on GPUs: cuDF for pandas-like data structures cuGraph for graph data cuML for machine learning In many data analytics and machine learning algorithms, computational bottlenecks tend to come from a small subset of steps that dominate the end-to-end performance. Reusable solutions for … Continued

Streamline ETL Workflows with Nested Data Types in RAPIDS libcudf

Nested data types are a convenient way to represent hierarchical relationships within columnar data. They are frequently used as part of extract, transform, load (ETL) workloads in business intelligence, recommender systems, cybersecurity, geospatial, and other applications.  List types can be used to easily attach multiple transactions to a user without creating a new lookup table, … Continued

RAPIDS Accelerates Data Science End-to-End

Today’s data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it. Unfortunately, the end of Moore’s law means that handling large data sizes in today’s data science ecosystem requires scaling out to many CPU nodes, which brings its own problems of communication bottlenecks, energy, and … Continued