Consumer Internet Developer Resources

A hub of news and technical resources for developers working in the consumer internet industry.

Consumer Internet Resources

Recommender System


NVIDIA Merlin is an end-to-end recommender-on-GPU framework that provides fast feature engineering and high training throughput to accelerate experimentation and production retraining of deep learning recommender models. Merlin also enables low-latency, high-throughput, production inference.

Merlin for Recommender Systems

Conversational AI and Natural Language Processing


The NVIDIA Jarvis framework includes pretrained conversational AI models, tools, and optimized end-to-end services for speech, vision, and natural language understanding (NLU) tasks. In addition to AI services, Jarvis enables you to fuse vision, audio, and other sensor inputs simultaneously to deliver capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.

Jarvis for Conversational AI and NLU


Using NVIDIA NeMo™, researchers and developers can build state-of-the-art conversational AI models using easy-to-use application programming interfaces.

NeMo for Conversational AI

Computer Vision


The NVIDIA DeepStream SDK lets you build and deploy AI-powered intelligent video analytics (IVA) applications and services. DeepStream offers a multi-platform scalable framework with Transport Layer Security (TLS) for deploying on the edge and connecting to any cloud.

DeepStream SDK for Intelligent Video Analytics

Transfer Learning Toolkit

The NVIDIA Transfer Learning Toolkit makes it possible to create accurate and efficient AI models for intelligent video analytics (IVA) and computer vision applications without expertise in AI frameworks. Developers, researchers, and software partners building intelligent vision AI apps and services can bring their own data to fine-tune pre-trained models instead of going through the hassle of training from scratch.

TLT for Intelligent Video Analytics

Deep Learning SDKs


NVIDIA® CUDA-X AI™ is a complete deep learning software stack for researchers and software developers to build high-performance, GPU-accelerated applications for conversational AI, recommendation systems, and computer vision. CUDA-X AI libraries deliver world-leading performance for both training and inference across industry benchmarks such as MLPerf.

Accelerate AI Training


NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications.

Boost Inference Capabilities

Data Science


NVIDIA RAPIDS™ is an open-source suite of data processing and machine learning libraries, developed by NVIDIA, that enables GPU acceleration for data science workflows. RAPIDS relies on NVIDIA’s CUDA® language, allowing users to leverage GPU processing and high-bandwidth GPU memory through user-friendly Python interfaces.

Speed Up Data Science

Apache Spark 3.0

GPU-accelerated Apache Spark 3.0 speeds up data science pipelines—without code changes—and data processing and model training while substantially lowering infrastructure costs.

Speed Up Data Processing

Profiling Tools

Deep Learning Profiler

The Deep Learning Profiler is a tool for profiling deep learning models to understand and improve performance of data science models visually via TensorBoard or by analyzing text reports.

Improve Performance with DLProf

NVIDIA Nsight Systems

NVIDIA Nsight™ Systems is a system-wide performance analysis tool designed to visualize an application’s algorithms, to help you identify the largest opportunities for optimiz and tuning to scale efficiently across any quantity or size of CPUs and GPUs, from large servers to the smallest system on a chip (SoC).

Scale Your AI System with Nsight

Pre-Qualified Containers

NGC provides a range of options that meet the needs of data scientists, developers, and researchers with various levels of AI expertise. Quickly deploy AI frameworks with containers, get a head start with pre-trained models or model training scripts, and use domain-specific workflows and Helm charts for the fastest AI implementations, giving you faster time to solution.

Pre-Qualified Containers for AI

Kubernetes on NVIDIA GPUs

Kubernetes on NVIDIA GPUs enables enterprises to scale up training and inference deployment to multi-cloud GPU clusters seamlessly. It lets you automate the deployment, maintenance, scheduling, and operation of multiple GPU-accelerated application containers across clusters of nodes.

Scale Enterprise AI with Kubernetes

NVIDIA Data Center GPU Manager

The NVIDIA Data Center GPU Manager (DCGM) is a set of tools for managing and monitoring NVIDIA GPUs in cluster environments. It's a low-overhead tool suite that performs a variety of functions on each host system, including active health monitoring, diagnostics, system validation, policies, power and clock management, group configuration, and accounting.

NVIDIA Data Center GPU Manager for Clusters

Making Data Science Teams Productive with Kubernetes and RAPIDS

Data collected on a vast scale has fundamentally changed the way organizations do business, driving demand for teams to provide meaningful data science, machine learning, and deep learning-based business insights quickly. Learn how data science leaders can use RAPIDS to boost their teams’ productivity while optimizing their costs and minimizing deployment time.

Increase Data Science Productivity

Framework for GPU-Accelerated Conversational AI Applications

Real-time conversational AI is a complex and challenging task. Explore NVIDIA Jarvis and how to access its high-performance conversational AI services easily and quickly with just a few commands.

Framework for Conversational AI

Accelerating Wide-and-Deep Recommender Inference on GPUs

This blog describes a highly optimized, GPU-accelerated inference implementation of a wide-and-deep model based on TensorFlow’s DNNLinearCombinedClassifier API. The proposed solution allows for easy conversion from a trained TensorFlow wide-and-deep model to a mixed-precision inference deployment.

Mixed Precision Inference Deployment

Accelerating Apache Spark 3.0 with GPUs and RAPIDS

NVIDIA has worked with the Apache Spark community to implement GPU acceleration with the release of Spark 3.0 and the open-source RAPIDS Accelerator for Spark. In this blog, learn how the RAPIDS Accelerator for Apache Spark uses GPUs to speed up end-to-end data preparation and model training on the same Spark cluster, Spark SQL, and DataFrame operations without requiring any code changes.

RAPIDS Accelerator for Spark

Training and Fine-Tuning BERT Using NVIDIA NGC

BERT (Bidirectional Encoder Representations from Transformers) provides a game-changing twist to the field of natural language processing (NLP). It runs on supercomputers powered by NVIDIA GPUs to train its huge neural networks and achieve unprecedented NLP accuracy, impinging in the space of known human language understanding. AI like this has been anticipated for many decades. With BERT, it’s finally arrived.

Fine-Tune NLP

Training Framework for Recommender Systems

Click-through rate (CTR) estimation is one of the most critical components of modern recommender systems. In this blog, get an introduction to HugeCTR, a GPU-accelerated training framework for CTR estimation and a pillar of NVIDIA Merlin. HugeCTR, on a single NVIDIA V100 Tensor Core GPU, achieves a speedup of up to 114X over TensorFlow on a 40-core CPU node and up to 8.3X that of TensorFlow on the same V100 GPU.

Framework for Recommenders

Linguistically Informed BERT with Kubeflow at LinkedIn

LinkedIn is working on multi-node, multi-GPU experiments with Kubeflow’s MPIJob operator, pre-training BERT using LinkedIn’s data, and hyper-parameter tuning with Microsoft Neural Network Intelligence (NNI) using Kubeflow to schedule distributed-training trials. This talk explains how the team trained models, including the fine-tuning, knowledge distillation, and model and experiment performance that were involved. The talk also discusses link prediction, including member-to-member and member-to-entity such as skill, title, and company.

Finetune with GPU-Accelerated BERT

How Machine Learning Powers On-Demand Logistics at DoorDash

Discover how DoorDash tackles the vehicle routing problem with trillions of combinations, delivery time predictions for all your favorite restaurants, and dynamic supply/demand balance. Engineers talk about improvements in machine learning training times between CPU and GPU, as well as between single-GPU and multi-GPU.

How DoorDash Improves Logistics with ML

Democratizing Conversational AI with Square Assistant

Square is developing Square Assistant, a conversational AI application that empowers small businesses to communicate with customers more efficiently. This talk covers the smorgasbord of deep learning models employed to understand and respond to messages at scale, and how they exemplify a user-centric approach to AI that continues to learn and improve over time.

How Square is Making Conversational AI Accessible

Democratized ML Pipelines and Spark RAPIDS-Based Hyperparameter Tuning at Verizon Media

XGBoost is an optimized and distributed gradient-based boosting library that's been applied to solve variety of data science problems with its classification and regression capabilities. Learn how engineers and scientists from Verizon Media used GPU-based distributed XGBoost to cross the limit of the CPU-based XGBoost solution, taking it to the next level. The result was enhanced capabilities to do hyperparameters search for optimized models andr maximum return.

How Verizon Media Optimized its Data Science Workflow

Speed Up Your Data Science Tasks by a Factor of 100+ Using AzureML and NVIDIA RAPIDS

Azure Machine Learning service is the first major cloud machine learning service to integrate NVIDIA RAPIDS, enabling parallel, high-performance computing through a few steps in a simple Jupyter Notebook. Learn how AzureML and RAPIDS dramatically accelerate common data science tasks by leveraging the power of NVIDIA GPUs scaled out to an AzureML cluster of multiple GPU nodes running DASK.

Accelerate Your Data Science Workflows


The NVIDIA Deep Learning Institute (DLI) offers hands-on training in AI and accelerated computing to solve real-world problems. Training is available as self-paced, online courses or in-person, instructor-led workshops.

View All Courses

Sign up for the latest developer news from NVIDIA