Apache Spark

Oct 06, 2025

Accelerating Large-Scale Data Analytics with GPU-Native Velox and NVIDIA cuDF

As workloads scale and demand for faster data processing grows, GPU-accelerated databases and query engines have been shown to deliver significant...

7 MIN READ

Jul 23, 2025

Serverless Distributed Data Processing with Apache Spark and NVIDIA AI on Azure

The process of converting vast libraries of text into numerical representations known as embeddings is essential for generative AI. Various technologies—from...

9 MIN READ

May 19, 2025

Spotlight: Atgenomix SeqsLab Scales Health Omics Analysis for Precision Medicine

In traditional clinical medical practice, treatment decisions are often based on general guidelines, past experiences, and trial-and-error approaches. Today,...

9 MIN READ

May 15, 2025

Predicting Performance on Apache Spark with GPUs

The world of big data analytics is constantly seeking ways to accelerate processing and reduce infrastructure costs. Apache Spark has become a leading platform...

9 MIN READ

May 08, 2025

Accelerate Deep Learning and LLM Inference with Apache Spark in the Cloud

Apache Spark is an industry-leading platform for big data processing and analytics. With the increasing prevalence of unstructured data—documents, emails,...

10 MIN READ

Apr 03, 2025

Accelerating Apache Parquet Scans on Apache Spark with GPUs

As data sizes have grown in enterprises across industries, Apache Parquet has become a prominent format for storing data. Apache Parquet is a columnar storage...

8 MIN READ

Mar 11, 2025

Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU

The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...

7 MIN READ

Decorative image of dark blue background with points of light connected with lines.

Mar 06, 2025

Accelerate Apache Spark ML on NVIDIA GPUs with Zero Code Change

The NVIDIA RAPIDS Accelerator for Apache Spark software plug-in pioneered a zero code change user experience (UX) for GPU-accelerated data processing. It...

5 MIN READ

A diagram of how JSON data is processed.

Jan 29, 2025

Accelerating JSON Processing on Apache Spark with GPUs

JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...

9 MIN READ

Aug 20, 2024

NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark

With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...

8 MIN READ

Jun 14, 2024

Level Up Your Skills with Five New NVIDIA Technical Courses

With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives...

4 MIN READ

Nov 09, 2023

Accelerating Neurosymbolic AI with RAPIDS and Prometheux Vadalog Parallel

As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge....

11 MIN READ

Oct 24, 2023

Reduce Apache Spark ML Compute Costs with New Algorithms in Spark RAPIDS ML Library

Spark RAPIDS ML is an open-source Python package enabling NVIDIA GPU acceleration of PySpark MLlib. It offers PySpark MLlib DataFrame API compatibility and...

8 MIN READ

Photo at a skewed angle of person looking at a monitor that has graphics on it, against a grey background.

Oct 18, 2023

New Self-Paced Course: RAPIDS Accelerator for Apache Spark

Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for...

1 MIN READ

An illustration representing Apache Spark.

Sep 14, 2023

ICYMI: Run RAPIDS-Accelerated Apache Spark on Amazon EMR

Streamline and accelerate deployment by integrating ETL and ML training into a single Apache Spark script on Amazon EMR.

1 MIN READ

Sep 06, 2023

GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations

Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost...

8 MIN READ