NVIDIA Merlin is an open beta framework for building large-scale deep learning recommender systems.

Get Started

NVIDIA Merlin Open Beta Recommender System Framework

Figure 1: NVIDIA Merlin Open Beta Recommender System Framework

Merlin empowers data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools that democratize building deep learning recommenders that provide better predictions than traditional methods and increase click-through rates. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.

NVTabular reduces data preparation time by GPU-accelerating feature transformations and preprocessing.

HugeCTR is a deep neural network training framework that is capable of distributed training across multiple GPUs and nodes for maximum performance.

NVIDIA Triton™ Inference Server and NVIDIA® TensorRT™ accelerate production inference on GPUs for feature transforms and neural network execution.

Domain-Specific APIs

Features APIs built specifically for managing the massive tabular datasets used in recommender systems.

Robust Performance

Specifically designed for 100+ terabyte recommender datasets and terabyte embedding tables with 10X the inference performance of other approaches.

State-of-the-Art Models

Supports state-of-the-art hybrid models such as Wide and Deep, Neural Collaborative Filtering (NCF), Variational Autoencoder (VAE), Deep Cross Network, DeepFM, and xDeepFM.

An End-to-End System Architecture

NVIDIA Merlin accelerates the entire pipeline from ingesting and training to deploying GPU-accelerated recommender systems. Models and tools simplify building and deploying a production-quality pipeline. We invite you to share some information about your recommender pipeline in this survey to influence the Merlin Roadmap.


NVTabular is a feature engineering and preprocessing library designed to quickly and easily manipulate terabytes of recommender system datasets. It provides a high level abstraction and accelerates computation on GPUs using the RAPIDS cuDF library.


HugeCTR is a highly efficient C++ framework designed for distributed training with model-parallel embedding tables and data-parallel neural networks. HugeCTR covers common and recent architectures such as Deep Learning Recommendation Model (DLRM), Wide and Deep, Deep Cross Network, and DeepFM.

Reference Applications

Get started with open source reference implementations and achieve state-of-the-art accuracy on public datasets with up to 10x the acceleration.

TensorRT and Triton

Take advantage of Triton and TensorRT to run inference efficiently on GPUs by maximizing throughput with the right combination of latency and GPU utilization.