Posts by Gregory Kimball
Data Science
Aug 07, 2025
Efficient Transforms in cuDF Using JIT Compilation
RAPIDS cuDF offers a broad set of ETL algorithms for processing data with GPUs. For pandas users, cuDF accelerated algorithms are available with the zero code...
9 MIN READ
Data Center / Cloud
Mar 11, 2025
Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU
The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...
7 MIN READ
Data Science
Feb 20, 2025
JSON Lines Reading with pandas 100x Faster Using NVIDIA cuDF
JSON is a widely adopted format for text-based information working interoperably between systems, most commonly in web applications and large language models...
10 MIN READ
Data Science
Nov 28, 2024
Supercharging Deduplication in pandas Using RAPIDS cuDF
A common operation in data analytics is to drop duplicate rows. Deduplication is critical in Extract, Transform, Load (ETL) workflows, where you might want to...
12 MIN READ
Data Science
Sep 11, 2024
Scaling Up to One Billion Rows of Data in pandas using RAPIDS cuDF
The One Billion Row Challenge is a fun benchmark to showcase basic data processing operations. It was originally launched as a pure-Java competition, and has...
11 MIN READ
Data Science
Jul 17, 2024
Encoding and Compression Guide for Parquet String Data Using RAPIDS
Parquet writers provide encoding and compression options that are turned off by default. Enabling these options may provide better lossless compression for your...
10 MIN READ