Posts by Gregory Kimball
Data Science
Nov 28, 2024
Supercharging Deduplication in pandas Using RAPIDS cuDF
A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...
12 MIN READ
Data Science
Sep 11, 2024
Scaling Up to One Billion Rows of Data in pandas using RAPIDS cuDF
The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has...
11 MIN READ
Data Science
Jul 17, 2024
Encoding and Compression Guide for Parquet String Data Using RAPIDS
Parquet writers provide encoding and compression options that are turned off by default. Enabling these options may provide better lossless compression for your...
10 MIN READ
Data Science
Dec 15, 2023
Streamline ETL Workflows with Nested Data Types in RAPIDS libcudf
Nested data types are a convenient way to represent hierarchical relationships within columnar data. They are frequently used as part of extract, transform,...
10 MIN READ
Data Science
Feb 09, 2023
GPU-Accelerated JSON Data Processing with RAPIDS
JSON is a widely adopted format for text-based information working interoperably between systems, most commonly in web applications. While the JSON format is...
8 MIN READ
Data Science
Oct 17, 2022
Mastering String Transformations in RAPIDS libcudf
Efficient processing of string data is vital for many data science applications. To extract valuable information from string data, RAPIDS libcudf provides...
17 MIN READ