NVIDIA Performance on MLPerf 0.6 AI Benchmarks

ResNet-50 v1.5 Time to Solution on V100

MXNet | Batch Size refer to CNN V100 Training table below | Precision: Mixed | Dataset: ImageNet2012 | Convergence criteria - refer to MLPerf requirements

Training Image Classification on CNNs

ResNet-50 V1.5 Throughput on V100

DGX-1: 8x Tesla V100-SXM2-32GB, E5-2698 v4 2.2 GHz | Batch Size = 256 | MXNet = 19.06-py3, Tensorflow and PyTorch = 19.07_py3 | Precision: Mixed | Dataset: ImageNet2012

ResNet-50 V1.5 Throughput on T4

Supermicro SYS-4029GP-TRT T4: 8x Tesla T4 16GB, Gold 6140 2.3 GHz | Batch Size = 208 for MXNet, PyTorch = 256, TensorFlow = 128 | MXNet and TensorFlow = 19.05-py3, PyTorch = 19.07_py3 | Precision: Mixed | Dataset: ImageNet2012

Training Performance

NVIDIA Performance on MLPerf 0.6 AI Benchmarks

FrameworkNetworkNetwork TypeTime to Solution GPUServerMLPerf-IDPrecisionDatasetGPU Version
MXNetResNet-50 v1.5CNN115.22 minutes8x V100DGX-10.6-8MixedImageNet2012V100-SXM2-16GB
CNN57.87 minutes16x V100DGX-20.6-17MixedImageNet2012V100-SXM3-32GB
CNN52.74 minutes16x V100DGX-2H0.6-19MixedImageNet2012V100-SXM3-32GB-H
CNN2.59 minutes512x V100DGX-2H0.6-29MixedImageNet2012V100-SXM3-32GB-H
CNN1.69 minutes1040x V100DGX-10.6-16MixedImageNet2012V100-SXM2-16GB
CNN1.33 minutes1536x V100DGX-2H0.6-30MixedImageNet2012V100-SXM3-32GB-H
PyTorchSSD-ResNet-34CNN22.36 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN12.21 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN11.41 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN4.78 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN2.67 minutes240x V100DGX-10.6-13MixedCOCO2017V100-SXM2-16GB
CNN2.56 minutes240x V100DGX-2H0.6-24MixedCOCO2017V100-SXM3-32GB-H
CNN2.23 minutes240x V100DGX-2H0.6-27MixedCOCO2017V100-SXM3-32GB-H
Mask R-CNNCNN207.48 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN101 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN95.2 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN32.72 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN22.03 minutes192x V100DGX-10.6-12MixedCOCO2017V100-SXM2-16GB
CNN18.47 minutes192x V100DGX-2H0.6-23MixedCOCO2017V100-SXM3-32GB-H
PyTorchGNMTRNN20.55 minutes8x V100DGX-10.6-9MixedWMT16 English-GermanV100-SXM2-16GB
RNN10.94 minutes16x V100DGX-20.6-18MixedWMT16 English-GermanV100-SXM3-32GB
RNN9.87 minutes16x V100DGX-2H0.6-20MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN2.12 minutes256x V100DGX-2H0.6-25MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN1.99 minutes384x V100DGX-10.6-14MixedWMT16 English-GermanV100-SXM2-16GB
RNN1.8 minutes384x V100DGX-2H0.6-26MixedWMT16 English-GermanV100-SXM3-32GB-H
PyTorchTransformerAttention20.34 minutes8x V100DGX-10.6-9MixedWMT17 English-GermanV100-SXM2-16GB
Attention11.04 minutes16x V100DGX-20.6-18MixedWMT17 English-GermanV100-SXM3-32GB
Attention9.8 minutes16x V100DGX-2H0.6-20MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.41 minutes160x V100DGX-2H0.6-22MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.05 minutes480x V100DGX-10.6-15MixedWMT17 English-GermanV100-SXM2-16GB
Attention1.59 minutes480x V100DGX-2H0.6-28MixedWMT17 English-GermanV100-SXM3-32GB-H
TensorFlowMiniGoReinforcement Learning27.39 minutes8x V100DGX-10.6-10MixedN/AV100-SXM2-16GB
Reinforcement Learning13.57 minutes24x V100DGX-10.6-11MixedN/AV100-SXM2-16GB

V100 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN527 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN606 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4105 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN4648 images/sec8x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50CNN1409 images/sec1x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN1442 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN10380 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN10530 images/sec8x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
ResNet-50 v1.5CNN1422 images/sec1x V100DGX-119.07-py3Mixed208ImageNet2012V100-SXM2-16GB
CNN1597 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN9566 images/sec8x V100DGX-119.06-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN11056 images/sec8x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN11507 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
PyTorchInception V3CNN543 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN629 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4156 images/sec8x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
Mask R-CNNCNN14 images/sec1x V100DGX-119.07-py3Mixed4COCO2014V100-SXM2-32GB
CNN17 images/sec1x V100DGX-2H19.05-py3Mixed16COCO2014V100-SXM3-32GB-H
CNN88 images/sec8x V100DGX-119.06-py3Mixed16COCO2014V100-SXM2-32GB
ResNet-50CNN819 images/sec1x V100DGX-119.02_py3Mixed256ImageNet2012V100-SXM2-16GB
CNN820 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6218 images/sec8x V100DGX-119.02-py3Mixed256ImageNet2012V100-SXM2-16GB
ResNet-50 v1.5CNN928 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
CNN1036 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN7288 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
SSD v1.1CNN225 images/sec1x V100DGX-119.07-py3Mixed64COCO 2017V100-SXM2-32GB
CNN299 images/sec1x V100DGX-2H19.05-py3Mixed64COCO 2017V100-SXM3-32GB-H
CNN2018 images/sec8x V100DGX-119.06-py3Mixed64COCO 2017V100-SXM2-32GB
Tacotron2CNN11743 total input tokens/sec1x V100DGX-119.07-py3Mixed80LJ Speech 1.1V100-SXM2-16GB
CNN16847 total input tokens/sec1x V100DGX-2H19.07-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
CNN81410 total input tokens/sec8x V100DGX-119.07-py3Mixed80LJ Speech 1.1V100-SXM2-32GB
CNN104447 total input tokens/sec8x V100DGX-2H19.07-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
WaveGlowCNN77780 output samples/sec1x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-16GB
CNN91052 output samples/sec1x V100DGX-2H19.05-py3Mixed8LJ Speech 1.1V100-SXM3-32GB-H
CNN533364 output samples/sec8x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
TensorFlowInception V3CNN542 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN626 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4097 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
ResNet-50 v1.5CNN840 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
CNN965 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN6474 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
SSD v1.1CNN114 images/sec1x V100DGX-119.07-py3Mixed32COCO 2017V100-SXM2-32GB
CNN125 images/sec1x V100DGX-219.07-py3Mixed32COCO 2017V100-SXM3-32GB
CNN665 images/sec8x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN770 images/sec8x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
U-Net IndustrialCNN95 images/sec1x V100DGX-119.07-py3Mixed16DAGM2007V100-SXM2-32GB
CNN110 images/sec1x V100DGX-2H19.07-py3Mixed16DAGM2007V100-SXM3-32GB-H
CNN492 images/sec8x V100DGX-119.07-py3Mixed2DAGM2007V100-SXM2-32GB
CNN515 images/sec8x V100DGX-219.07-py3Mixed2DAGM2007V100-SXM3-32GB
PyTorchGNMT V2RNN75975 total tokens/sec1x V100DGX-119.07-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
RNN82766 total tokens/sec1x V100DGX-219.07-py3Mixed128WMT16 English-GermanV100-SXM3-32GB
RNN582605 total tokens/sec8x V100DGX-119.07-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
TensorFlowGNMT V2RNN22471 total tokens/sec1x V100DGX-119.07-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
RNN26039 total tokens/sec1x V100DGX-2H19.07-py3Mixed192WMT16 English-GermanV100-SXM3-32GB-H
RNN149008 total tokens/sec8x V100DGX-119.07-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
PyTorchNCFRecommender22093850 samples/sec1x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender24473776 samples/sec1x V100DGX-2H19.07-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
Recommender104122673 samples/sec8x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender109969915 samples/sec8x V100DGX-2H19.07-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
TensorFlowNCFRecommender26415693 samples/sec1x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender56183685 samples/sec8x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
TensorFlowBERT-LARGEAttention32 sentences/sec1x V100DGX-119.07-py3Mixed10SQuaD v1.1V100-SXM2-32GB
Attention37 sentences/sec1x V100DGX-2H19.07-py3Mixed10SQuaD v1.1V100-SXM3-32GB-H
Attention147 sentences/sec8xV100DGX-119.06-py3Mixed10SQuaD v1.1V100-SXM2-32GB
Attention189 sentences/sec8x V100DGX-2H19.07-py3Mixed10SQuaD v1.1V100-SXM3-32GB-H

T4 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN174 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1381 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN446 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed208ImageNet2012Tesla T4
CNN4116 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
PyTorchInception V3CNN176 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1337 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
Mask R-CNNCNN6 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4COCO2014Tesla T4
CNN38 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed4COCO2014Tesla T4
ResNet-50 v1.5CNN286 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
CNN2295 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
SSD v1.1CNN76 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed64COCO 2017Tesla T4
CNN622 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed64COCO 2017Tesla T4
Tacotron2CNN11236 total input tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed80LJ Speech 1.1Tesla T4
CNN78803 total input tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed80LJ Speech 1.1Tesla T4
WaveGlowCNN32167 output samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed8LJ Speech 1.1Tesla T4
CNN250173 output samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
TensorFlowInception V3CNN177 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1345 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN263 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
CNN2057 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
SSD v1.1CNN51 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed32COCO 2017Tesla T4
CNN281 images/sec8x T4Supermicro SYS-4029GP-TRT T419.06-py3Mixed32COCO 2017Tesla T4
U-Net IndustrialCNN28 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed16DAGM2007Tesla T4
CNN190 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed2DAGM2007Tesla T4
PyTorchGNMT V2RNN25288 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128WMT16 English-GermanTesla T4
RNN182072 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128WMT16 English-GermanTesla T4
TensorFlowGNMT V2RNN9679 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed192WMT16 English-GermanTesla T4
RNN57464 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed192WMT16 English-GermanTesla T4
PyTorchNCFRecommender7584587 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
Recommender27011297 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
TensorFlowNCFRecommender10297010 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
Recommender16872638 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
TensorFlowBERTAttention9 sentences/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed3SQuaD v1.1Tesla T4
Attention30 sentences/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed3SQuaD v1.1Tesla T4