cuDF

Aug 22, 2025
How to Spot (and Fix) 5 Common Performance Bottlenecks in pandas Workflows
Slow data loads, memory-intensive joins, and long-running operations—these are problems every Python practitioner has faced. They waste valuable time and make...
7 MIN READ

Aug 07, 2025
Efficient Transforms in cuDF Using JIT Compilation
RAPIDS cuDF offers a broad set of ETL algorithms for processing data with GPUs. For pandas users, cuDF accelerated algorithms are available with the zero code...
9 MIN READ

Aug 01, 2025
7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows
You've been there. You wrote the perfect Python script, tested it on a sample CSV, and everything worked flawlessly. But when you unleashed it on the full 10...
8 MIN READ

Jul 18, 2025
3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs
If you work with pandas, you’ve probably hit the wall. It’s that moment when your trusty workflow, so elegant on smaller datasets, grinds to a halt on a...
4 MIN READ

Jul 17, 2025
Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science
In our previous post, we introduced the setup of predictive modeling in chip manufacturing and operations, highlighting common challenges such as imbalanced...
6 MIN READ

Jun 27, 2025
How to Work with Data Exceeding VRAM in the Polars GPU Engine
In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB)...
4 MIN READ

Jun 18, 2025
AI in Manufacturing and Operations at NVIDIA: Accelerating ML Models with NVIDIA CUDA-X Data Science
NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows—from wafer fabrication and circuit probing to...
8 MIN READ

May 07, 2025
Building Nemotron-CC, A High-Quality Trillion Token Dataset for LLM Pretraining from Common Crawl Using NVIDIA NeMo Curator
Curating high-quality pretraining datasets is critical for enterprise developers aiming to train state-of-the-art large language models (LLMs). To enable...
7 MIN READ

Apr 17, 2025
Grandmaster Pro Tip: Winning First Place in Kaggle Competition with Feature Engineering Using cuDF pandas
Feature engineering remains one of the most effective ways to improve model accuracy when working with tabular data. Unlike domains such as NLP and computer...
5 MIN READ

Apr 10, 2025
Efficiently Scaling Polars GPU Parquet Reader
When working with large datasets, the performance of your data processing tools becomes critical. Polars, an open-source library for data manipulation known for...
4 MIN READ

Feb 20, 2025
JSON Lines Reading with pandas 100x Faster Using NVIDIA cuDF
JSON is a widely adopted format for text-based information working interoperably between systems, most commonly in web applications and large language models...
10 MIN READ

Feb 06, 2025
Get Started with GPU Acceleration for Data Science
In data science, operational efficiency is key to handling increasingly complex and large datasets. GPU acceleration has become essential for modern workflows,...
8 MIN READ

Jan 29, 2025
Accelerating JSON Processing on Apache Spark with GPUs
JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...
9 MIN READ

Jan 13, 2025
Upcoming Webinar: Inside the RAPIDS-Accelerated Polars GPU Engine
In the webinar on January 28th, you'll get an inside look of the new GPU engine to learn how Polars' declarative API and query optimizer enable seamless GPU...
1 MIN READ

Dec 19, 2024
Enhance Your Training Data with New NVIDIA NeMo Curator Classifier Models
Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for...
11 MIN READ

Dec 19, 2024
RAPIDS 24.12 Introduces cuDF on PyPI, CUDA Unified Memory for Polars, and Faster GNNs
RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up groupby aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the...
8 MIN READ