In recent years, multiple neural network architectures have emerged, designed to solve specific problems such as object detection, language translation, and recommendation engines. These architectures are further adapted to handle different data sizes, formats, and resolutions when applied to multiple domains in medical imaging, autonomous driving, financial services and others.
Academic and industry researchers and data scientists rely on the flexibility of the NVIDIA platform to prototype, explore, train and deploy a wide variety of deep neural networks architectures using GPU-accelerated deep learning frameworks such as MXNet, Pytorch, TensorFlow, and inference optimizers such as TensorRT.
Designed specifically for deep learning, Tensor Cores on Volta and Turing GPUs, deliver significantly higher training and inference performance compared to full precision (FP32) training. Each Tensor Core provides matrix multiply in half precision (FP16), and accumulating results in full precision (FP32). This key capability enables Volta to deliver 3X performance speedups in training and inference over the previous generation. All samples are optimized to take advantage of Tensor Cores and have been tested for accuracy and convergence. You can access these reference implementations through NVIDIA NGC and GitHub.
Tensor Cores optimized training code-samples that ship with NVIDIA optimized PyTorch, MXNet and TensorFlow containers.
Tensor Cores optimized training code-samples. Learn how they are implemented, train with your own data or integrate into your applications.
Model Scripts by Application AreasClick on the application area to jump directly to that section:
Computer vision deals with algorithms and techniques for computers to understand the world around us using image and video data or in other words, teaching machines to automate the tasks performed by human visual systems. Common computer vision tasks include image classification, object detection in images and videos, image segmentation, and image restoration. In recent years, deep learning has revolutionized the field of computer vision with algorithms that deliver super-human accuracy on the above tasks. Below is a list of popular deep neural network models used in computer vision and their open-source implementation.
ResNet-50: Residual network architecture introduced “skip connections” and won the 1st place on the ILSVRC 2015 classification task
SSD: The SSD320 v1.2 model is based on the SSD: Single Shot MultiBox Detector paper, which describes SSD as "a method for detecting objects in images using a single deep neural network".
Mask R-CNN: NVIDIA's Mask R-CNN 19.2 is an optimized version of Facebook's implementation, leveraging mixed precision arithmetic and tensor cores on V100 GPUs for 1.3x faster training times while maintaining target accuracy.
UNET-Industrial: This U-Net model is adapted from the original version of the U-Net model which is a convolutional auto-encoder for 2D image segmentation.
Natural Language Processing
Natural-language processing (NLP) deals with algorithms and techniques for computers to understand, interpret, manipulate and converse in human languages. NLP algorithms can work with audio and text data and transform them into audio or text outputs. Common NLP tasks include sentiment analysis, speech recognition, speech synthesis, language translation, and natural-language generation. Deep learning algorithms enable end-to-end training of NLP models without the need to hand-engineer features from raw input data. Below is a list of popular deep neural network models used in natural language processing their open source implementations.
GNMT: Google's Neural Machine Translation System, included as part of OpenSeq2Seq sample.
BERT: Bidirectional Encoder Representations from Transformers (BERT) is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper. NVIDIA's BERT 19.03 is an optimized version of Google's official implementation, leveraging mixed precision arithmetic and tensor cores on V100 GPUS for faster training times while maintaining target accuracy.
Transformer: This implementation of the Transformer model architecture is based on the optimized implementation in Facebook's Fairseq NLP toolkit, built on top of PyTorch. The original version in the Fairseq project was developed using Tensor Cores, which provides significant training speedup. Our implementation improves the performance of a training and is tested on a DGX-1V 16GB.
Recommender systems or recommendation engines are algorithms that offer ratings or suggestions for a particular product or item, from other possibilities, based on user behavior attributes. Common recommender system applications include recommendations for movies, music, news, books, search queries and other products. Below are examples for popular deep neural network models used for recommender systems.
Neural Collaborative Filtering (NCF): is a common technique powering recommender systems used in a wide array of applications such as online shopping, media streaming applications, social media and ad placement.
Text to Speech
Tacotron 2 and WaveGlow: This text-to-speech (TTS) system is a combination of two neural network models: a modified Tacotron 2 model from the Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions paper and a flow-based neural network model from the WaveGlow: A Flow-based Generative Network for Speech Synthesis paper.
Automatic Speech Recognition
NVIDIA GPUs accelerate diverse application areas, from vision to speech and from recommender systems to generative adversarial networks (GANs).
They also support every deep learning framework across multiple network types, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and more.
See how optimized NGC containers and NVIDIA’s complete solution stack power your deep learning research.