Join Netflix, Fidelity, and NVIDIA to learn best practices for building, training, and deploying modern recommender systems.    Register Free


NVIDIA Merlin™ accelerates the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems. Merlin HugeCTR (Huge Click-Through-Rate) is a deep neural network (DNN) training and inference framework designed for recommender systems. It provides distributed training with model-parallel embedding tables, an embeddings cache, and data-parallel neural networks across multiple GPUs and nodes for maximum performance. HugeCTR covers common and recent architectures such as Deep Learning Recommendation Model (DLRM), Wide and Deep, Deep Cross Network (DCN), and DeepFM.

Download and Try It Today

Merlin HugeCTR Core Features

Training Embeddings at Scale

Data scientists and machine learning engineers building deep learning recommenders work with large embedding tables that often exceed available memory. Merlin HugeCTR's model parallelism and embedding cache is designed for recommender workflows. This makes it easy to train an embedding table of any size and fully leverage compute memory. HugeCTR also leverages the NVIDIA Collective Communication Library (NCCL) for high-speed, multi-node, and multi-GPU communications at scale.

Learn more
NVIDIA Collective Communication Library (NCCL)
HugeCTRs embedding layer

Inherently Asynchronous, Multi-Threaded Pipeline

Effective data loading is challenging for machine learning engineers and data scientists who are continuously experimenting, training, and fine-tuning recommender models. HugeCTR's data reader is inherently asynchronous and multi-threaded. It will read batched data records that are high-dimensional, sparse, or categorical. Each record is fed directly to fully connected layers. HugeCTR's embedding layer compresses input-sparse features to dense-embedding vectors. HugeCTR's model parallelism enables embedded training in a homogeneous cluster across multiple nodes and GPUs.

Explore HugeCTR on GitHub

Inference, Hierarchical Deployment on Multiple GPUs

HugeCTR provides concurrent model inference execution across multiple GPUs through the use of a parameter server and embedding cache that are shared between multiple model instances. HugeCTR also leverages NVIDIA Triton™ Inference Server to ease workflows for data scientists and machine learning engineers when deploying models to production.

Learn more
NVIDIA Triton™ Inference Server

HugeCTR : open-source component of NVIDIA Merlin

Interoperability with Open Source

Machine learning engineers and data scientists use a hybrid of methods, libraries, tools, and frameworks that often include open-source components. HugeCTR is an open-source component of NVIDIA Merlin and is designed to optimize embeddings training within recommender workflows. HugeCTR is interoperable with open source and its SOK (SparseOperationsKit) is compatible with TensorFlow Distribute Strategy and Horovod.

Learn more

Embeddings Optimization

Embeddings optimization enables more experimentation, fine tuning, and better prediction at scale. HugeCTR's optimized embedding implementation is up to 8X more performant than other frameworks’ embedding layers. This optimized implementation is also made available as a TensorFlow plug-in that works seamlessly with TensorFlow and acts as a convenient drop-in replacement for the TensorFlow-native embedding layers.

Learn more  
HugeCTR: Embeddings optimization

Get Started with Merlin HugeCTR

All NVIDIA Merlin components are available as open-source projects on GitHub. However, a more convenient way to make use of these components is by using Merlin HugeCTR containers from the NVIDIA NGC catalog. Containers package the software application, libraries, dependencies, and runtime compilers in a self-contained environment. This way, the application environment is both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.

Merlin Training

The NGC container allows users to do preprocessing, feature engineering, and training of a deep learning-based recommender system model with HugeCTR.

Merlin Inference

HugeCTR supports Triton Inference Server to provide GPU-accelerated inference. The NGC container enables users to deploy Merlin NVTabular workflows and HugeCTR models to Triton Inference Server for production.

HugeCTR on GitHub

The GitHub repo helps users get started with HugeCTR and quickly train a model using a Python interface. Available resources include documentation, tutorials, examples, and notebooks.

Built on NVIDIA AI

NVIDIA AI empowers millions of hands-on practitioners and thousands of companies to use the NVIDIA AI Platform to accelerate their workloads. NVIDIA Merlin, is part of the NVIDIA AI Platform. NVIDIA Merlin was built upon and leverages additional NVIDIA AI software within the platform.


RAPIDS is a suite of open source software libraries and APIs that enables end-to-end data science and analytics pipelines entirely on GPUs.

Try it Today:



cuDF i is a Python GPU DataFrame library for loading, joining, aggregating, filtering, and manipulating data.

Try it Today:


NVIDIA Triton Inference Server

Take advantage of NVIDIA Triton™ Inference Server to run inference efficiently on GPUs by maximizing throughput with the right combination of latency and GPU utilization.

Try it Today:


Merlin HugeCTR Resources

Explore all Merlin resources.

Tencent and Merlin HugeCTR

Learn how Tencent deployed their real advertising recommendation training with Merlin and achieved more than 7X speedup over the original TensorFlow solution on the same GPU platform.

Watch the On-Demand
GTC Session

GPU Accelerated Recommender Systems Training and Inference

In this ACM RecSys 2022 accepted submission, learn about NVIDIA Merlin HugeCTR, a framework for click through rate estimation that is optimized for training and inference. It also enables training at scale with model-parallel embeddings and data-parallel neural networks.

Explore now

Best Practices from Tencent

Discover insights, advice, and best practices about leading the design and development of Tencent's deep learning recommendations system.

Read interview

Meituan and Merlin HugeCTR

Learn how Meituan optimizes their machine learning platform by building a high-performance deep learning training framework deployed on CPU and GPU clusters.

Read interview

HugeCTR is available to download from the NVIDIA NGC catalog or from the GitHub repository.

Download from NGC