Apache Spark

Mar 11, 2025
Efficient ETL with Polars and Apache Spark on NVIDIA Grace CPU
The NVIDIA Grace CPU Superchip delivers outstanding performance and best-in-class energy efficiency for CPU workloads in the data center and in the cloud. The...
7 MIN READ

Mar 06, 2025
Accelerate Apache Spark ML on NVIDIA GPUs with Zero Code Change
The NVIDIA RAPIDS Accelerator for Apache Spark software plug-in pioneered a zero code change user experience (UX) for GPU-accelerated data processing. It...
5 MIN READ

Jan 29, 2025
Accelerating JSON Processing on Apache Spark with GPUs
JSON is a popular format for text-based data that allows for interoperability between systems in web applications as well as data management. The format has...
9 MIN READ

Aug 20, 2024
NVIDIA GH200 Superchip Delivers Breakthrough Energy Efficiency and Node Consolidation for Apache Spark
With the rapid growth of generative AI, CIOs and IT leaders are looking for ways to reclaim data center resources to accommodate new AI use cases that promise...
8 MIN READ

Jun 14, 2024
Level Up Your Skills with Five New NVIDIA Technical Courses
With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives...
4 MIN READ

Nov 09, 2023
Accelerating Neurosymbolic AI with RAPIDS and Prometheux Vadalog Parallel
As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge....
11 MIN READ

Oct 24, 2023
Reduce Apache Spark ML Compute Costs with New Algorithms in Spark RAPIDS ML Library
Spark RAPIDS ML is an open-source Python package enabling NVIDIA GPU acceleration of PySpark MLlib. It offers PySpark MLlib DataFrame API compatibility and...
8 MIN READ

Oct 18, 2023
New Self-Paced Course: RAPIDS Accelerator for Apache Spark
Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for...
1 MIN READ

Sep 14, 2023
ICYMI: Run RAPIDS-Accelerated Apache Spark on Amazon EMR
Streamline and accelerate deployment by integrating ETL and ML training into a single Apache Spark script on Amazon EMR.
1 MIN READ

Sep 06, 2023
GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations
Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings...
8 MIN READ

Jul 17, 2023
GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks
We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...
7 MIN READ

Jun 12, 2023
Distributed Deep Learning Made Easy with Spark 3.4
Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep...
7 MIN READ

Jun 02, 2023
GPU Integration Propels Data Center Efficiency and Cost Savings for Taboola
When you see a context-relevant advertisement on a web page, it's most likely content served by a Taboola data pipeline. As the leading content recommendation...
13 MIN READ

Apr 18, 2023
New GPU Library Lowers Compute Costs for Apache Spark ML
Spark MLlib is a key component of Apache Spark for large-scale machine learning and provides built-in implementations of many popular machine learning...
6 MIN READ

Apr 04, 2023
Topic Modeling and Image Classification with Dataiku and NVIDIA Data Science
The Dataiku platform for everyday AI simplifies deep learning. Use cases are far-reaching, from image classification to object detection and natural language...
11 MIN READ

Mar 21, 2023
Catapulting Enterprises to the Leading Edge of AI with NVIDIA AI Enterprise 3.1
Generative AI has marked an important milestone in the AI revolution journey. We are at a fundamental breaking point where enterprises are not only getting...
4 MIN READ