NVIDIA Merlin NVTabular

NVIDIA Merlin™ accelerates the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems. Merlin NVTabular is a feature engineering and preprocessing library designed to effectively manipulate terabytes of recommender system datasets and significantly reduce data preparation time. It provides efficient feature transformations, preprocessing, and high-level abstraction that accelerates computation on GPUs using the RAPIDS™ cuDF library.

NVIDIA Merlin NVTabular

Download and Try It Today


Merlin NVTabular Core Features

Feature Engineering

NVTabular's fast feature transforms reduce data prep time and eases deploying recommender models to production. With NVTabular recommender focused APIs, data scientists and machine learning engineers are able to quickly process datasets of all sizes, implement more experimentation, and are not bound by CPU or GPU memory. Also, includes multi-hot categoricals and vector continuous passing support to ease feature engineering.

Run examples
NVTabular recommender focused APIs
 NVTabular data loaders are optimized for TensorFlow (TF)

Interoperability with Open Source

Data scientists and machine learning engineers use a hybrid of methods, tools, libraries, and frameworks, including open source. NVTabular native tabular data support includes comma-separated values (CSV) files, Apache Parquet, Apache Orc, and Apache Avro. Also, NVTabular data loaders are optimized for TensorFlow (TF), PyTorch, and Merlin HugeCTR. All Merlin components, including NVTabluar, are interoperable with open source.

Learn more

Accelerated on GPUs

NVTabular provides a high level abstraction that accelerates computation on GPUs using the RAPIDS cuDF library. Also, NVTabular's support for multi-node scaling and multi-GPU with DASK-CUDA and dask.distributed accelerates distributed parallelism.

Explore more
GPUs using the RAPIDS cuDF library

Merlin NVTabular Performance

NVTabular's multi-GPU support using RAPIDS cuDF, Dask, and Dask_cuDF enables a high-performance recommender-specific pipeline. Provides 95x speedup using NVTabular multi-GPU on the NVIDIA DGX™ A100 compared to Spark on a four-node, 96 vCPU core, CPU cluster processing 1.3 TB of data in the Criteo Terabyte dataset, Also provides a speedup of 5.3x using eight NVIDIA A100 GPUs, from 10 minutes on 1xA100 to 1.9 minutes on 8xA100.

Explore more benchmark details

Speedup Using NVTabular on Multi-GPU

NVTabular's multi-GPU support

Get Started with Merlin NVTabular

All NVIDIA Merlin components are available as open-source projects on GitHub. However, a more convenient way to make use of these components is by using Merlin NVTabular containers from the NVIDIA NGC catalog. Containers package the software application, libraries, dependencies, and runtime compilers in a self-contained environment. This way, the application environment is both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.


Merlin NVTabular on NGC

Merlin Training

Enables users to do preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with HugeCTR.

Merlin Tensorflow Training

Utilize preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with TensorFlow.

Merlin Pytorch Training

Leverage preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with PyTorch.

Merlin Inference

Container allows users to deploy NVTabular workflows and HugeCTR or TensorFlow models to the NVIDIA Triton™ Inference Server for production.


Merlin NVTabular on GitHub


The GitHub repository helps users get started with NVTabular with documentation, tutorials, examples, and notebooks.



Diagram illustrating components of NVIDIA Merlin

Merlin NVTabular Resources

Explore all Merlin resources.

Announcing NVTabular Open Beta

Discover how multi-GPU support and data loaders accelerate recommender workflows.

READ NOW

NVIDIA Merlin

NVIDIA Merlin consists of Merlin Feature Engineering: NVTabular, Merlin Training: HugeCTR, Merlin Inference: NVIDIA® TensorRT™ and Triton, and Merlin Reference Applications.

LEARN MORE

Merlin Technical Resource Kit

Learn how to accelerate the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems.

DOWNLOAD THE KIT NOW

NVIDIA Deep Learning Institute (DLI)

Register for this DLI workshop to learn the fundamental tools and techniques for building highly effective recommender systems.

REGISTER NOW

NVTabular is available to download from the NVIDIA NGC catalog, the GitHub repository, or the Anaconda repository.

Download Now