Deep Learning Examples

Deep neural network architectures consist of large number of parameterized, differentiable functions, whose weights are learnt using gradient-based optimization. To achieve state of the art performance for any given application, researchers and data scientists experiment with a wide range of architectures with varying number of layers, type of functions and training algorithms. This means deep learning platforms must not only be fast, but must also be easily programmable.

In recent years, multiple neural network architectures have emerged, designed to solve specific problems such as object detection, language translation, and recommendation engines. These architectures are further adapted to handle different data sizes, formats, and resolutions when applied to multiple domains in medical imaging, autonomous driving, financial services and others.

Academic and industry researchers and data scientists rely on the flexibility of the NVIDIA platform to prototype, explore, train and deploy a wide variety of deep neural networks architectures using GPU-accelerated deep learning frameworks such as Caffe2, Chainer, Microsoft Cognitive Toolkit, MXNet, PaddlePaddle, Pytorch, TensorFlow, and inference optimizers such as TensorRT.

Pre-packaged Samples With NGC Containers

Tensor Cores optimized code samples with NVIDIA optimized deep learning software stack are included in NGC deep learning framework containers. All samples are tuned, tested and maintained by NVIDIA. Get started quickly by pulling the latest container from NGC.

Get From NGC

Open-Source Samples on GitHub

All Tensor Cores optimized code-samples are open-source. Use these samples to learn how they are implemented, train with your own data or integrate into your applications. Files issues, submit PRs and requests directly in the GitHub repository.

Get From GitHub

Note About Tensor Cores

Designed specifically for deep learning, Tensor Cores on newer GPUs such as Tesla V100 and Titan V, deliver significantly higher training and inference performance compared to full precision (FP32) training. Each Tensor Core provides matrix multiply in half precision (FP16), and accumulating results in full precision (FP32). This key capability enables Volta to deliver 3X performance speedups in training and inference over the previous generation. All samples are optimized to take advantage of Tensor Cores and have been tested for accuracy and convergence.

Models by Application Areas

Click on the application area to jump directly to that section:

Computer Vision

Computer vision deals with algorithms and techniques for computers to understand the world around us using image and video data or in other words, teaching machines to automate the tasks performed by human visual systems. Common computer vision tasks include image classification, object detection in images and videos, image segmentation, and image restoration. In recent years, deep learning has revolutionized the field of computer vision with algorithms that deliver super-human accuracy on the above tasks. Below is a list of popular deep neural network models used in computer vision and their open-source implementation.

ResNet50: Residual network architecture introduced “skip connections” and won the 1st place on the ILSVRC 2015 classification task

Inception v3: Version 3 of the Inception architecture, which was the winning architecture of the ILSVRC 2014 classification task. It introduced the inception module to drastically reduce the number of parameters in the network.

VGG16/19: Runner-up at the ILSVRC 2014 classification task.

LeNet: Image classification network used to recognize hand-written digits. LeNet is the first successful applications of Convolutional Neural Networks, developed by Yann LeCun.

CIFAR10: CIFAR10 is a dataset of images with 10 classes. This model architecture, based on AlexNet, is designed to achieve good accuracy (not state-of-the-art) and can be used as a starting point to experiment alternate approaches.

FastPhotoStyle: Fast photorealistic style transfer network. Takes as input a content photo and a style photo, and transfers the style of the style photo to the content photo.

Video Superresolution (CNN + Flownet): Implementations of end-to-end learning of video super-resolution with motion compensation called VSRNet

Flownet-S (Image Encoder/Decoder): Implementation of FlowNet optical flow estimation network

  • [PyTorch] (Tensor Core version coming soon)

Speech and Natural Language Processing

Natural-language processing (NLP) deals with algorithms and techniques for computers to understand, interpret, manipulate and converse in human languages. NLP algorithms can work with audio and text data and transform them into audio or text outputs. Common NLP tasks include sentiment analysis, speech recognition, speech synthesis, language translation, and natural-language generation. Deep learning algorithms enable end-to-end training of NLP models without the need to hand-engineer features from raw input data. Below is a list of popular deep neural network models used in natural language processing their open source implementations.

OpenSeq2Seq: An open-source framework built on TensorFlow, that provides building blocks for training encoder-decoder models such as GNMT and DeepStream2 for machine translation and speech recognition applications.

GNMT: Google's Neural Machine Translation System, included as part of OpenSeq2Seq sample.

Unsupervised Sentiment Discovery: Scalable implementation of Learning to Generate Reviews and Discovering Sentiment by Open AI

WordLevel: Language modeling architecture using multi-layer RNNs - Elman, GRU, or LSTM

DeepSpeech2: End-to-end speech recognition developed by Baidu, included as part of OpenSeq2Seq sample.

Transformer: Transformer model architecture is based on the optimized implementation in Facebook's Fairseq NLP toolkit, built on top of PyTorch

Recommender Systems

Recommender systems or recommendation engines are algorithms that offer ratings or suggestions for a particular product or item, from other possibilities, based on user behavior attributes. Common recommender system applications include recommendations for movies, music, news, books, search queries and other products. Below are examples for popular deep neural network models used for recommender systems.

DeepRecommender: deep autoencoder based end-to-end recommender system

Neural Collaborative Filtering (NCF): is a common technique powering recommender systems used in a wide array of applications such as online shopping, media streaming applications, social media and ad placement.

Generative Adversarial Networks

Generative Adversarial Networks (GANs) are unsupervised deep learning techniques to learn a distribution over input images by contesting two neural networks - one that can generate (randomly sample) images and other that can discriminate (classify) image as real or fake. Common applications enabled by GANs include photo realistic image generation, 3D object generation, image editing, super-resolution, and synthetic data generation for training. Below are examples of popular deep neural network models for GANs.

Pix2pixHD: High-resolution photo realistic image-to-image translation with semantic manipulation

  • [PyTorch] (Tensor Core version coming soon)

Performance Guide

NVIDIA GPUs accelerate diverse application areas, from vision to speech and from recommender systems to generative adversarial networks (GANs).

They also support every deep learning framework across multiple network types, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and more.

See how optimized NGC containers and NVIDIA’s complete solution stack power your deep learning research.