ETL Processing

Sep 06, 2023
GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations
Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings...
8 MIN READ

Jul 17, 2023
GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks
We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...
7 MIN READ

Jul 13, 2023
Whole Slide Image Analysis in Real Time with MONAI and RAPIDS
Digital pathology slide scanners generate massive images. Glass slides are routinely scanned at 40x magnification, resulting in gigapixel images. Compression...
10 MIN READ

Jul 12, 2023
Apache Airflow for Authoring Workflows in NVIDIA Base Command Platform
So, you have a ton of data pipelines today and are considering investing in GPU acceleration through NVIDIA Base Command Platform. What steps should you take?...
17 MIN READ

Jun 28, 2023
How to Deploy an AI Model in Python with PyTriton
AI models are everywhere, in the form of chatbots, classification and summarization tools, image models for segmentation and detection, recommendation models,...
6 MIN READ

Jun 12, 2023
Distributed Deep Learning Made Easy with Spark 3.4
Apache Spark is an industry-leading platform for distributed extract, transform, and load (ETL) workloads on large-scale data. However, with the advent of deep...
7 MIN READ

Mar 15, 2023
Smarter Retail Data Analytics with GPU Accelerated Apache Spark Workloads on Google Cloud Dataproc
A retailer's supply chain includes the sourcing of raw materials or finished goods from suppliers; storing them in warehouses or distribution centers; and...
13 MIN READ

Dec 05, 2022
Scraping Real-Estate Sites for Data Acquisition with Scrapy
Data is one of the most valuable assets that a business can possess. It sits at the core of data science and data analysis: without data, they’re both...
13 MIN READ

Sep 12, 2022
Scaling Data Pipelines: AT&T Optimizes Speed, Cost, and Efficiency with GPUs
It is well-known that GPUs are the typical go-to solution for large machine learning (ML) applications, but what if GPUs were applied to earlier stages of the...
10 MIN READ

Aug 30, 2022
Accelerating ETL on KubeFlow with RAPIDS
In the machine learning and MLOps world, GPUs are widely used to speed up model training and inference, but what about the other stages of the workflow like ETL...
13 MIN READ

Jul 29, 2022
Evaluating Data Lakes and Data Warehouses as Machine Learning Data Repositories
Data is the lifeblood of modern enterprises, whether you’re a retailer, financial service company, or digital advertiser. Across industries, organizations are...
11 MIN READ

Mar 23, 2021
Accelerating Analytics and AI with Alluxio and NVIDIA GPUs
Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must...
10 MIN READ

Oct 05, 2020
Announcing the NVIDIA NVTabular Open Beta with Multi-GPU Support and New Data Loaders
Recently, NVIDIA CEO Jensen Huang announced updates to the open beta of NVIDIA Merlin, an end-to-end framework that democratizes the development of large-scale...
12 MIN READ

Jul 15, 2020
Accelerating ETL for Recommender Systems on NVIDIA GPUs with NVTabular
Recommender systems are ubiquitous in online platforms, helping users navigate through an exponentially growing number of goods and services. These models are...
11 MIN READ