For HPC performance, please go here.


Deep Learning Training

NVIDIA’s complete solution stack, from GPUs to libraries, and containers on NVIDIA GPU Cloud (NGC), allows data scientists to quickly get up and running with deep learning. NVIDIA® Tesla® V100 Tensor Core GPUs leverage mixed precision to accelerate deep learning training throughputs across every framework and every type of neural network. NVIDIA captured all the top spots on 6 benchmarks submitted to MLPerf, the AI’s first industry-wide benchmark, a testament to our GPU-accelerated platform approach.

NVIDIA Performance on MLPerf AI Benchmarks

ResNet-50 Time to Solution on V100

MXNet  |  Batch Size refer to CNN V100 Training table below  |  18.11-py3  |  Precision: Mixed  |  Dataset: ImageNet2012  |  Convergence criteria - refer to MLPerf requirements

Training Image Classification on CNNs

ResNet-50 V1.5 Throughput on NVIDIA Tesla V100

DGX-1: 8x Tesla V100 32GB, E5-2698 v4 2.2 GHz (PyTorch, TensorFlow), DGX-2: 8x Tesla V100 32GB, Platinum 8168 2.7 GHz (MXNet) | Batch Size = 256 for PyTorch, 128 for all others | 19.05-py3 | Precision: Mixed | Dataset: ImageNet2012

 
 

ResNet-50 V1.5 Throughput on NVIDIA Tesla T4

Supermicro SYS-4029GP-TRT T4: 8x Tesla T4 16GB, Gold 6140 2.3 GHz | Batch Size = 208 for MXNet, 256 for PyTorch, 128 for TensorFlow | 19.05-py3 | Precision: Mixed | Dataset: ImageNet2012

 

Training Performance

NVIDIA Performance on MLPerf AI Benchmarks

FrameWorkNetworkNetwork TypeTime to Solution GPUServerContainerPrecisionBatchSizeDataSetGPU Version
MXNetResNet-50 v1.5CNN135 minutes8x V100DGX-1V18.11-py3Mixed208ImageNet2012V100-SXM2-16GB
CNN73.9 minutes16x V100DGX-218.11-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN70 minutes16x V100DGX-2H18.11-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6.3 minutes640x V100DGX-1V Saturn18.11-py3Mixed26ImageNet2012V100-SXM2-16GB
CNN7.4 minutes512x V100DGX-2H18.11-py3Mixed32ImageNet2012V100-SXM3-32GB
PyTorchSSDCNN27 minutes8x V100DGX-1V18.11-py3Mixed152COCO2017V100-SXM2-16GB
CNN15.9 minutes16x V100DGX-218.11-py3Mixed128COCO2017V100-SXM3-32GB
CNN14.1 minutes16x V100DGX-2H18.11-py3Mixed128COCO2017V100-SXM3-32GB
CNN6.5 minutes64x V100DGX-1V Saturn18.11-py3Mixed32COCO2017V100-SXM2-16GB
CNN5.6 minutes64x V100DGX-2H18.11-py3Mixed32COCO2017V100-SXM3-32GB
Mask R-CNNCNN323 minutes8x V100DGX-1V18.11-py3Mixed4COCO2014V100-SXM2-16GB
CNN176.3 minutes16x V100DGX-218.11-py3Mixed4COCO2014V100-SXM3-32GB
CNN166.9 minutes16x V100DGX-2H18.11-py3Mixed4COCO2014V100-SXM3-32GB
CNN81 minutes64x V100DGX-1V Saturn18.11-py3Mixed2COCO2014V100-SXM2-16GB
CNN72.1 minutes64x V100DGX-2H18.11-py3Mixed2COCO2014V100-SXM3-32GB
NCFCNN0.47 minutes8x V100DGX-1V18.11-py3Mixed131072MovieLens 20 MillionV100-SXM2-16GB
CNN0.4 minutes16x V100DGX-218.11-py3Mixed65536MovieLens 20 MillionV100-SXM3-32GB
CNN0.4 minutes16x V100DGX-2H18.11-py3Mixed65536MovieLens 20 MillionV100-SXM3-32GB
PyTorchGNMTRNN18 minutes8x V100DGX-1V18.11-py3Mixed128WMT16 English-GermanV100-SXM2-16GB
RNN10.5 minutes16x V100DGX-218.11-py3Mixed64WMT16 English-GermanV100-SXM3-32GB
RNN9.8 minutes16x V100DGX-2H18.11-py3Mixed64WMT16 English-GermanV100-SXM3-32GB
RNN2.8 minutes256x V100DGX-1V Saturn18.11-py3Mixed32WMT16 English-GermanV100-SXM2-16GB
RNN2.7 minutes256x V100DGX-2H18.11-py3Mixed32WMT16 English-GermanV100-SXM3-32GB
PyTorchTransformerAttention33 minutes8x V100DGX-1V18.11-py3Mixed5120WMT14 English-GermanV100-SXM2-16GB
Attention21.2 minutes16x V100DGX-218.11-py3Mixed10240WMT14 English-GermanV100-SXM3-32GB
Attention19.2 minutes16x V100DGX-2H18.11-py3Mixed10240WMT14 English-GermanV100-SXM3-32GB
Attention6.2 minutes192x V100DGX-1V Saturn18.11-py3Mixed2560WMT14 English-GermanV100-SXM2-16GB

V100 Training Performance

FrameWorkNetworkNetwork
Type
Throughput GPU ServerContainerPrecisionBatchSizeDataSetGPU
Version
MXNetInception V3CNN511 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN584 images/sec1x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4063 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN4520 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50CNN1409 images/sec1x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN1442 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN10380 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN10530 images/sec8x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
ResNet-50 V1.5CNN1419 images/sec1x V100DGX-119.05-py3Mixed208ImageNet2012V100-SXM2-16GB
CNN1580 images/sec1x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN11056 images/sec8x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN11507 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
PyTorchInception V3CNN537 images/sec1x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN572 images/sec1x V100DGX-219.03-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN4156 images/sec8x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
Mask R-CNNCNN14 images/sec1x V100DGX-119.05-py3Mixed16COCO2014V100-SXM2-32GB
CNN17 images/sec1x V100DGX-2H19.05-py3Mixed16COCO2014V100-SXM3-32GB-H
CNN84 images/sec8x V100DGX-119.05-py3Mixed16COCO2014V100-SXM2-32GB
NCFCNN19839034 samples/sec1x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN23398907 samples/sec1x V100DGX-2H19.05-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
CNN95524496 samples/sec8x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN106267011 samples/sec8x V100DGX-2H19.05-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
ResNet-50CNN849 images/sec1x V100DGX-118.10-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN898 images/sec1x V100DGX-218.10-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6675 images/sec8x V100DGX-118.10-py3Mixed256ImageNet2012V100-SXM2-32GB
ResNet-50 V1.5CNN855 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN920 images/sec1x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN6791 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
SSDCNN257 images/sec1x V100DGX-119.05-py3Mixed64COCO 2017V100-SXM2-32GB
CNN275 images/sec1x V100DGX-219.05-py3Mixed64COCO 2017V100-SXM3-32GB
CNN2017 images/sec8x V100DGX-119.05-py3Mixed64COCO 2017V100-SXM2-32GB
Tacotron2CNN2553 tokens/sec1x V100DGX-119.05-py3Mixed80LJ Speech 1.1V100-SXM2-16GB
CNN2745 tokens/sec1x V100DGX-219.05-py3Mixed80LJ Speech 1.1V100-SXM3-32GB
CNN3054 tokens/sec1x V100DGX-2H19.05-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
CNN17185 tokens/sec8x V100DGX-119.04-py3Mixed80LJ Speech 1.1V100-SXM2-32GB
CNN18265 tokens/sec8x V100DGX-219.04-py3Mixed80LJ Speech 1.1V100-SXM3-32GB
WaveGlowCNN73120 output samples/sec1x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
CNN83975 output samples/sec1x V100DGX-219.05-py3Mixed8LJ Speech 1.1V100-SXM3-32GB
CNN533364 output samples/sec8x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
TensorFlowInception V3CNN537 images/sec1x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN572 images/sec1x V100DGX-219.05-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN4078 images/sec8x V100DGX-119.05-py3Mixed256ImageNet2012V100-SXM2-32GB
NCFCNN24393781 samples/sec1x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
CNN65260213 samples/sec8x V100DGX-119.05-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
ResNet-50CNN857 images/sec1x V100DGX-119.02-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN921 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6704 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN1000 images/sec8x V100DGX-2H19.01-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN7022 images/sec8x V100DGX-2H19.01-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50 V1.5CNN785 images/sec1x V100DGX-119.05-py3Mixed128ImageNet2012V100-SXM2-32GB
CNN846 images/sec1x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN6295 images/sec8x V100DGX-119.05-py3Mixed128ImageNet2012V100-SXM2-32GB
SSDCNN123 images/sec1x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN136 images/sec1x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
CNN665 images/sec8x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN770 images/sec8x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
U-Net IndustrialCNN100 images/sec1x V100DGX-119.05-py3Mixed16DAGM2007V100-SXM2-32GB
CNN107 images/sec1x V100DGX-219.05-py3Mixed16DAGM2007V100-SXM3-32GB
CNN517 images/sec8x V100DGX-119.05-py3Mixed2DAGM2007V100-SXM2-32GB
CNN546 images/sec8x V100DGX-219.05-py3Mixed2DAGM2007V100-SXM3-32GB
PyTorchGNMT V2RNN78347 tokens/sec1x V100DGX-119.05-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
RNN83860 tokens/sec1x V100DGX-219.05-py3Mixed128WMT16 English-GermanV100-SXM3-32GB
RNN598342 tokens/sec8x V100DGX-119.05-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
TensorFlowGNMT V2RNN24899 tokens/sec1x V100DGX-119.05-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
RNN25733 tokens/sec1x V100DGX-2H19.05-py3Mixed192WMT16 English-GermanV100-SXM3-32GB-H
RNN138216 tokens/sec8x V100DGX-119.05-py3Mixed192WMT16 English-GermanV100-SXM2-32GB
TensorFlowBERTAttention29 sentences/sec1x V100DGX-119.05-py3Mixed10SQuaD v1.1V100-SXM2-32GB
Attention34.5 sentences/sec1x V100DGX-2H19.05-py3Mixed10SQuaD v1.1V100-SXM3-32GB-H
Attention161.6 sentences/sec8x V100DGX-219.05-py3Mixed10SQuaD v1.1V100-SXM3-32GB

T4 Training Performance

FrameWorkNetworkNetwork
Type
Throughput GPU ServerContainerPrecisionBatchSizeDataSetGPU
Version
MXNetInception V3CNN172 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1359 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
ResNet-50CNN425 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
CNN3329 images/sec8x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
ResNet-50 V1.5CNN447 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
CNN4116 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
PyTorchInception V3CNN164 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1253 images/sec8x T4Supermicro SYS-4029GP-TRT T419.03-py3Mixed128ImageNet2012Tesla T4
Mask R-CNNCNN6 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4ImageNet2012Tesla T4
CNN39 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4ImageNet2012Tesla T4
NCFCNN7134482 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
CNN25428142 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
ResNet-50CNN252 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed256ImageNet2012Tesla T4
CNN2054 images/sec8x T4Supermicro SYS-4029GP-TRT T419.01-py3Mixed256ImageNet2012Tesla T4
ResNet-50 V1.5CNN282 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
CNN2241 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
SSDCNN86 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed64COCO 2017Tesla T4
CNN693 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed64COCO 2017Tesla T4
Tacotron2CNN1466 tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed80LJ Speech 1.1Tesla T4
CNN9390 tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed80LJ Speech 1.1Tesla T4
WaveGlowCNN33256 output samples/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
CNN250173 output samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
TensorFlowInception V3CNN181 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN1358 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
NCFCNN9573658 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
CNN19069773 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed1048576ImageNet2012Tesla T4
ResNet-50CNN294 images/sec1x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
CNN2262 images/sec8x T4Supermicro SYS-4029GP-TRT T419.02-py3Mixed128ImageNet2012Tesla T4
ResNet-50 V1.5CNN272 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
CNN2151 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128ImageNet2012Tesla T4
SSDCNN52 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed32COCO 2017Tesla T4
CNN281 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed32COCO 2017Tesla T4
U-Net IndustrialCNN29 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed16DAGM2007Tesla T4
CNN196 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed2DAGM2007Tesla T4
PyTorchGNMT V2RNN25992 tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128WMT16 English-GermanTesla T4
RNN184525 tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed128WMT16 English-GermanTesla T4
TensorFlowGNMT V2RNN9654 tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed192WMT16 English-GermanTesla T4
RNN53570 tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed192WMT16 English-GermanTesla T4
TensorFlowBERTAttention8 sentences/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed3SQuaD v1.1Tesla T4
31 sentences/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed3SQuaD v1.1Tesla T4

 

NVIDIA Deep Learning Inference Performance

NVIDIA® TensorRT™ running on NVIDIA GPUs enable the most efficient deep learning inference performance across multiple application areas and models. This versatility provides wide latitude to data scientists to create the optimal low-latency solution. Visit NVIDIA GPU Cloud (NGC) to download any of these containers.

NVIDIA Tesla® V100 Tensor Cores GPUs leverage mixed-precision to combine high throughput with low latencies across every type of neural network. Tesla P4 is an inference GPU, designed for optimal power consumption and latency, for ultra-efficient scale-out servers. Read the inference whitepaper to learn more about NVIDIA’s inference platform.

Measuring the inference performance involves balancing a lot of variables. PLASTER is an acronym that describes the key elements for measuring deep learning performance. Each letter identifies a factor (Programmability, Latency, Accuracy, Size of Model, Throughput, Energy Efficiency, Rate of Learning) that must be considered to arrive at the right set of tradeoffs and to produce a successful deep learning implementation. Refer to NVIDIA’s PLASTER whitepaper for more details.

Inference Image Classification on CNNs with TensorRT

ResNet-50 Throughput

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py2 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 Latency

DGX-1: 1x Tesla V100-SXM2-16GB, Platinum 8168 2.7 GHz | TensorRT 5.1 | Batch Size = 1 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 1 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 Power Efficiency

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.05-py3 | Precision: INT8 | Dataset: Synthetic

 

Inference Performance

V100 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPU ServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11579 images/sec14 images/sec/watt0.631x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
1686 images/sec12 images/sec/watt0.591x V100DGX-219.05-py3INT8SyntheticV100-SXM3-32GB
CNN22129 images/sec18 images/sec/watt0.941x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN85140 images/sec35 images/sec/watt1.61x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN8211805 images/sec44 images/sec/watt71x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN12812345 images/sec45 images/sec/watt101x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
MobileNet V2CNN11737 images/sec18 images/sec/watt0.581x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN22886 images/sec30 images/sec/watt0.691x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN88848 images/sec63 images/sec/watt0.91x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN3217649 images/sec77 images/sec/watt1.81x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN12823262 images/sec81 images/sec/watt5.51x V100DGX-1-INT8SyntheticV100-SXM2-16GB
ResNet-50CNN11118 images/sec8.2 images/sec/watt0.891x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
1176 images/sec7.1 images/sec/watt0.851x V100DGX-219.05-py3INT8SyntheticV100-SXM3-32GB
CNN21551 images/sec11 images/sec/watt1.31x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN83308 images/sec21 images/sec/watt2.41x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN395821 images/sec22 images/sec/watt6.71x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN1287636 images/sec27 images/sec/watt171x V100DGX-219.05-py3MixedSyntheticV100-SXM2-16GB
ResNet-50v1.5CNN1934 images/sec7.1 images/sec/watt1.11x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN21396 images/sec9.8 images/sec/watt1.41x V100DGX-119.05-py3INT8SyntheticV100-SXM2-16GB
CNN83239 images/sec20 images/sec/watt2.51x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN1287309 images/sec25 images/sec/watt181x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
VGG16CNN1724 images/sec3.8 images/sec/watt1.41x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN21133 images/sec5.5 images/sec/watt1.81x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN82044 images/sec8 images/sec/watt3.91x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
CNN1282855 images/sec9.9 images/sec/watt451x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
NMTRNN14013 tokens/sec tokens/sec/watt131x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN26290 tokens/sec tokens/sec/watt161x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN6456531 tokens/sec tokens/sec/watt581x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN12873375 tokens/sec tokens/sec/watt891x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
Deep RecommenderRecommender15541 images/sec48 images/sec/watt0.181x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender210066 images/sec88 images/sec/watt0.21x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender64282209 images/sec1874 images/sec/watt0.231x V100DGX-1-MixedSyntheticV100-SXM2-16GB
Deep RecommenderRecommender128393347 images/sec2461 images/sec/watt0.331x V100DGX-1-MixedSyntheticV100-SXM2-16GB
NCFRecommender121824 images/sec310 images/sec/watt0.051x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender641335854 images/sec16998 images/sec/watt0.051x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender2500091680666 images/sec730549 images/sec/watt0.271x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB
Recommender100000114022270 images/sec694185 images/sec/watt0.881x V100DGX-119.05-py3MixedSyntheticV100-SXM2-16GB

TensorRT 5.1, except TensorRT 5.0 for MobileNet V2 and Deep Recommender

 

T4 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPU ServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11687 images/sec25 images/sec/watt0.591x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN22356 images/sec35 images/sec/watt0.851x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN85458 images/sec78 images/sec/watt1.51x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN527335 images/sec105 images/sec/watt7.11x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN1287516 images/sec108 images/sec/watt171x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
MobileNet V2CNN11766 images/sec34 images/sec/watt0.571x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN23235 images/sec55 images/sec/watt0.621x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN87251 images/sec106 images/sec/watt1.11x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN328869 images/sec129 images/sec/watt3.61x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
CNN1289059 images/sec131 images/sec/watt141x T4Supermicro SYS-4029GP-TRT T4-INT8SyntheticTesla T4
ResNet-50CNN11065 images/sec15 images/sec/watt0.941x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN21725 images/sec25 images/sec/watt1.21x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN83778 images/sec55 images/sec/watt2.11x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN334723 images/sec68 images/sec/watt71x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
CNN1285106 images/sec74 images/sec/watt251x T4Supermicro SYS-4029GP-TRT T419.06-py3INT8SyntheticTesla T4
VGG16CNN1407 images/sec5.8 images/sec/watt2.51x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN2656 images/sec9.5 images/sec/watt3.11x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN81379 images/sec20 images/sec/watt5.81x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
CNN321598 images/sec23 images/sec/watt201x T4Supermicro SYS-4029GP-TRT T419.02-py3INT8SyntheticTesla T4
CNN1281895 images/sec27 images/sec/watt681x T4Supermicro SYS-4029GP-TRT T419.05-py3INT8SyntheticTesla T4
NMTRNN12511 tokens/sec38 tokens/sec/watt201x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN23768 tokens/sec58 tokens/sec/watt271x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN89975 tokens/sec160 tokens/sec/watt411x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
RNN12834124 tokens/sec647 tokens/sec/watt1921x T4Supermicro SYS-4029GP-TRT T4-Mixedwmt16-English-GermanTesla T4
Deep RecommenderRecommender13252 images/sec47 images/sec/watt0.311x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender26417 images/sec94 images/sec/watt0.311x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender823650 images/sec344 images/sec/watt0.341x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
Recommender128219511 images/sec3186 images/sec/watt0.581x T4Supermicro SYS-4029GP-TRT T4-MixedSyntheticTesla T4
NCFRecommender17623 images/sec274 images/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender64488930 images/sec16539 images/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender2500050381198 images/sec726308 images/sec/watt0.51x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4
Recommender10000053374338 images/sec726308 images/sec/watt0.51x T4Supermicro SYS-4029GP-TRT T419.05-py3MixedSyntheticTesla T4

TensorRT 5.1, except TensorRT 5.0 for MobileNet V2, NMT and Deep Recommender

 

Last updated: June 19th, 2019