NVIDIA Merlin NVTabular
NVIDIA Merlin™ accelerates the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems. Merlin NVTabular is a feature engineering and preprocessing library designed to effectively manipulate terabytes of recommender system datasets and significantly reduce data preparation time. It provides efficient feature transformations, preprocessing, and high-level abstraction that accelerates computation on GPUs using the RAPIDS™ cuDF library.
Download and Try It Today
Merlin NVTabular Core Features
NVTabular's fast feature transforms reduce data prep time and eases deploying recommender models to production. With NVTabular recommender focused APIs, data scientists and machine learning engineers are able to quickly process datasets of all sizes, implement more experimentation, and are not bound by CPU or GPU memory. Also, includes multi-hot categoricals and vector continuous passing support to ease feature engineering.Run examples
Interoperability with Open Source
Data scientists and machine learning engineers use a hybrid of methods, tools, libraries, and frameworks, including open source. NVTabular native tabular data support includes comma-separated values (CSV) files, Apache Parquet, Apache Orc, and Apache Avro. Also, NVTabular data loaders are optimized for TensorFlow (TF), PyTorch, and Merlin HugeCTR. All Merlin components, including NVTabluar, are interoperable with open source.Learn more
Accelerated on GPUs
NVTabular provides a high level abstraction that accelerates computation on GPUs using the RAPIDS cuDF library. Also, NVTabular's support for multi-node scaling and multi-GPU with DASK-CUDA and dask.distributed accelerates distributed parallelism.Explore more
Merlin NVTabular Performance
NVTabular's multi-GPU support using RAPIDS cuDF, Dask, and Dask_cuDF enables a high-performance recommender-specific pipeline. Provides 95x speedup using NVTabular multi-GPU on the NVIDIA DGX™ A100 compared to Spark on a four-node, 96 vCPU core, CPU cluster processing 1.3 TB of data in the Criteo Terabyte dataset, Also provides a speedup of 5.3x using eight NVIDIA A100 GPUs, from 10 minutes on 1xA100 to 1.9 minutes on 8xA100.Explore more benchmark details
Speedup Using NVTabular on Multi-GPU
Get Started with Merlin NVTabular
All NVIDIA Merlin components are available as open-source projects on GitHub. However, a more convenient way to make use of these components is by using Merlin NVTabular containers from the NVIDIA NGC catalog. Containers package the software application, libraries, dependencies, and runtime compilers in a self-contained environment. This way, the application environment is both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.
Merlin NVTabular on NGC
Enables users to do preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with HugeCTR.
Merlin Tensorflow Training
Utilize preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with TensorFlow.
Merlin Pytorch Training
Leverage preprocessing and feature engineering with NVTabular and then train a deep learning-based recommender system model with PyTorch.
Container allows users to deploy NVTabular workflows and HugeCTR or TensorFlow models to the NVIDIA Triton™ Inference Server for production.
Merlin NVTabular on GitHub
The GitHub repository helps users get started with NVTabular with documentation, tutorials, examples, and notebooks.
Merlin NVTabular Resources
Announcing NVTabular Open Beta
Discover how multi-GPU support and data loaders accelerate recommender workflows.
NVIDIA Merlin consists of Merlin Feature Engineering: NVTabular, Merlin Training: HugeCTR, Merlin Inference: NVIDIA® TensorRT™ and Triton, and Merlin Reference Applications.
Merlin Technical Resource Kit
Learn how to accelerate the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems.