NVIDIA Merlin is a framework for building high-performance, deep learning-based recommender systems.

Get Started

Figure 1: NVIDIA Merlin Recommender System Framework

Merlin includes tools for building deep learning-based recommendation systems that provide better predictions than traditional methods and increase click-through rates. Each stage of the pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.

NVTabular reduces data preparation time by GPU-accelerating feature transformations and preprocessing.

HugeCTR is a deep neural network training framework that is capable of distributed training across multiple GPUs and nodes for maximum performance.

NVIDIA Triton™ Inference Server and NVIDIA® TensorRT™ accelerate production inference on GPUs for feature transforms and neural network execution.

Domain-Specific APIs

Features APIs built specifically for managing the massive tabular datasets used in recommendation systems.

Robust Performance

Specifically designed for 100+ terabyte recommender datasets and terabyte embedding tables with 10X the inference performance of other approaches.

State-of-the-Art Models

Supports state-of-the-art hybrid models such as Wide and Deep, Neural Collaborative Filtering (NCF), Variational Autoencoder (VAE), Deep Cross Network, DeepFM, and xDeepFM.

An End-to-End System Architecture

NVIDIA Merlin accelerates the entire pipeline from ingesting and training to deploying GPU-accelerated recommender systems. Models and tools simplify building and deploying a production-quality pipeline. We invite you to share some information about your recommender pipeline in this survey to influence the Merlin Roadmap.


NVTabular is a feature engineering and preprocessing library designed to quickly and easily manipulate terabytes of recommender system datasets. It provides a high level abstraction and accelerates computation on GPUs using the RAPIDS cuDF library.

Try it Today:
NGC | GitHub | Anaconda


HugeCTR is a highly efficient C++ framework designed for distributed training with model-parallel embedding tables and data-parallel neural networks. HugeCTR covers common and recent architectures such as Wide and Deep, Deep Cross Network, and DeepFM. Deep Learning Recommendation Model (DLRM) support is coming soon.

Try it Today:

Reference Applications

Get started with open source reference implementations and achieve state-of-the-art accuracy on public datasets with up to 10x the acceleration.

Try them Today:
Wide and Deep in TensorFlow
DLRM in PyTorch

TensorRT and Triton

Take advantage of Triton and TensorRT to run inference efficiently on GPUs by maximizing throughput with the right combination of latency and GPU utilization.

Try them Today:
TensorRT | Triton Inference Server