For HPC performance, please go here.


NVIDIA’s complete solution stack, from GPUs to libraries, and containers on NVIDIA GPU Cloud (NGC), allows data scientists to quickly get up and running with deep learning. NVIDIA® Tesla® V100 Tensor Core GPUs leverage mixed precision to accelerate deep learning training throughputs across every framework and every type of neural network. NVIDIA breaks performance records on MLPerf, the AI’s first industry-wide benchmark, a testament to our GPU-accelerated platform approach.

NVIDIA Performance on MLPerf 0.6 AI Benchmarks

ResNet-50 v1.5 Time to Solution on V100

MXNet | Batch Size refer to CNN V100 Training table below | Precision: Mixed | Dataset: ImageNet2012 | Convergence criteria - refer to MLPerf requirements

Training Image Classification on CNNs

ResNet-50 V1.5 Throughput on V100

DGX-1: 8x Tesla V100-SXM2-32GB, E5-2698 v4 2.2 GHz | Batch Size = 256 | MXNet = 19.06-py3, Tensorflow and PyTorch = 19.07_py3 | Precision: Mixed | Dataset: ImageNet2012

ResNet-50 V1.5 Throughput on T4

Supermicro SYS-4029GP-TRT T4: 8x Tesla T4 16GB, Gold 6140 2.3 GHz | Batch Size = 208 for MXNet, PyTorch = 256, TensorFlow = 128 | MXNet and TensorFlow = 19.05-py3, PyTorch = 19.07_py3 | Precision: Mixed | Dataset: ImageNet2012

Training Performance

NVIDIA Performance on MLPerf 0.6 AI Benchmarks

FrameworkNetworkNetwork TypeTime to Solution GPUServerMLPerf-IDPrecisionDatasetGPU Version
MXNetResNet-50 v1.5CNN115.22 minutes8x V100DGX-10.6-8MixedImageNet2012V100-SXM2-16GB
CNN57.87 minutes16x V100DGX-20.6-17MixedImageNet2012V100-SXM3-32GB
CNN52.74 minutes16x V100DGX-2H0.6-19MixedImageNet2012V100-SXM3-32GB-H
CNN2.59 minutes512x V100DGX-2H0.6-29MixedImageNet2012V100-SXM3-32GB-H
CNN1.69 minutes1040x V100DGX-10.6-16MixedImageNet2012V100-SXM2-16GB
CNN1.33 minutes1536x V100DGX-2H0.6-30MixedImageNet2012V100-SXM3-32GB-H
PyTorchSSD-ResNet-34CNN22.36 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN12.21 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN11.41 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN4.78 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN2.67 minutes240x V100DGX-10.6-13MixedCOCO2017V100-SXM2-16GB
CNN2.56 minutes240x V100DGX-2H0.6-24MixedCOCO2017V100-SXM3-32GB-H
CNN2.23 minutes240x V100DGX-2H0.6-27MixedCOCO2017V100-SXM3-32GB-H
Mask R-CNNCNN207.48 minutes8x V100DGX-10.6-9MixedCOCO2017V100-SXM2-16GB
CNN101 minutes16x V100DGX-20.6-18MixedCOCO2017V100-SXM3-32GB
CNN95.2 minutes16x V100DGX-2H0.6-20MixedCOCO2017V100-SXM3-32GB-H
CNN32.72 minutes64x V100DGX-2H0.6-21MixedCOCO2017V100-SXM3-32GB-H
CNN22.03 minutes192x V100DGX-10.6-12MixedCOCO2017V100-SXM2-16GB
CNN18.47 minutes192x V100DGX-2H0.6-23MixedCOCO2017V100-SXM3-32GB-H
PyTorchGNMTRNN20.55 minutes8x V100DGX-10.6-9MixedWMT16 English-GermanV100-SXM2-16GB
RNN10.94 minutes16x V100DGX-20.6-18MixedWMT16 English-GermanV100-SXM3-32GB
RNN9.87 minutes16x V100DGX-2H0.6-20MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN2.12 minutes256x V100DGX-2H0.6-25MixedWMT16 English-GermanV100-SXM3-32GB-H
RNN1.99 minutes384x V100DGX-10.6-14MixedWMT16 English-GermanV100-SXM2-16GB
RNN1.8 minutes384x V100DGX-2H0.6-26MixedWMT16 English-GermanV100-SXM3-32GB-H
PyTorchTransformerAttention20.34 minutes8x V100DGX-10.6-9MixedWMT17 English-GermanV100-SXM2-16GB
Attention11.04 minutes16x V100DGX-20.6-18MixedWMT17 English-GermanV100-SXM3-32GB
Attention9.8 minutes16x V100DGX-2H0.6-20MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.41 minutes160x V100DGX-2H0.6-22MixedWMT17 English-GermanV100-SXM3-32GB-H
Attention2.05 minutes480x V100DGX-10.6-15MixedWMT17 English-GermanV100-SXM2-16GB
Attention1.59 minutes480x V100DGX-2H0.6-28MixedWMT17 English-GermanV100-SXM3-32GB-H
TensorFlowMiniGoReinforcement Learning27.39 minutes8x V100DGX-10.6-10MixedN/AV100-SXM2-16GB
Reinforcement Learning13.57 minutes24x V100DGX-10.6-11MixedN/AV100-SXM2-16GB

Refer to MLPerf requirements for convergence criteria

V100 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN527 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN606 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4105 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN4648 images/sec8x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
ResNet-50CNN1409 images/sec1x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN1442 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN10380 images/sec8x V100DGX-119.02-py3Mixed128ImageNet2012V100-SXM2-16GB
CNN10530 images/sec8x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
ResNet-50 v1.5CNN1422 images/sec1x V100DGX-119.07-py3Mixed208ImageNet2012V100-SXM2-16GB
CNN1597 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN9566 images/sec8x V100DGX-119.06-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN11056 images/sec8x V100DGX-219.05-py3Mixed128ImageNet2012V100-SXM3-32GB
CNN11507 images/sec8x V100DGX-2H19.05-py3Mixed256ImageNet2012V100-SXM3-32GB-H
PyTorchInception V3CNN543 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN629 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4156 images/sec8x V100DGX-119.03-py3Mixed256ImageNet2012V100-SXM2-32GB
Mask R-CNNCNN14 images/sec1x V100DGX-119.07-py3Mixed4COCO2014V100-SXM2-32GB
CNN17 images/sec1x V100DGX-2H19.05-py3Mixed16COCO2014V100-SXM3-32GB-H
CNN88 images/sec8x V100DGX-119.06-py3Mixed16COCO2014V100-SXM2-32GB
ResNet-50CNN819 images/sec1x V100DGX-119.02_py3Mixed256ImageNet2012V100-SXM2-16GB
CNN820 images/sec1x V100DGX-219.02-py3Mixed256ImageNet2012V100-SXM3-32GB
CNN6218 images/sec8x V100DGX-119.02-py3Mixed256ImageNet2012V100-SXM2-16GB
ResNet-50 v1.5CNN928 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
CNN1036 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN7288 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
SSD v1.1CNN225 images/sec1x V100DGX-119.07-py3Mixed64COCO 2017V100-SXM2-32GB
CNN299 images/sec1x V100DGX-2H19.05-py3Mixed64COCO 2017V100-SXM3-32GB-H
CNN2018 images/sec8x V100DGX-119.06-py3Mixed64COCO 2017V100-SXM2-32GB
Tacotron2CNN11743 total input tokens/sec1x V100DGX-119.07-py3Mixed80LJ Speech 1.1V100-SXM2-16GB
CNN16847 total input tokens/sec1x V100DGX-2H19.07-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
CNN81410 total input tokens/sec8x V100DGX-119.07-py3Mixed80LJ Speech 1.1V100-SXM2-32GB
CNN104447 total input tokens/sec8x V100DGX-2H19.07-py3Mixed80LJ Speech 1.1V100-SXM3-32GB-H
WaveGlowCNN77780 output samples/sec1x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-16GB
CNN91052 output samples/sec1x V100DGX-2H19.05-py3Mixed8LJ Speech 1.1V100-SXM3-32GB-H
CNN533364 output samples/sec8x V100DGX-119.05-py3Mixed8LJ Speech 1.1V100-SXM2-32GB
TensorFlowInception V3CNN542 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
CNN626 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN4097 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-32GB
ResNet-50 v1.5CNN840 images/sec1x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
CNN965 images/sec1x V100DGX-2H19.07-py3Mixed256ImageNet2012V100-SXM3-32GB-H
CNN6474 images/sec8x V100DGX-119.07-py3Mixed256ImageNet2012V100-SXM2-16GB
SSD v1.1CNN114 images/sec1x V100DGX-119.07-py3Mixed32COCO 2017V100-SXM2-32GB
CNN125 images/sec1x V100DGX-219.07-py3Mixed32COCO 2017V100-SXM3-32GB
CNN665 images/sec8x V100DGX-119.05-py3Mixed32COCO 2017V100-SXM2-32GB
CNN770 images/sec8x V100DGX-219.05-py3Mixed32COCO 2017V100-SXM3-32GB
U-Net IndustrialCNN95 images/sec1x V100DGX-119.07-py3Mixed16DAGM2007V100-SXM2-32GB
CNN110 images/sec1x V100DGX-2H19.07-py3Mixed16DAGM2007V100-SXM3-32GB-H
CNN492 images/sec8x V100DGX-119.07-py3Mixed2DAGM2007V100-SXM2-32GB
CNN515 images/sec8x V100DGX-219.07-py3Mixed2DAGM2007V100-SXM3-32GB
PyTorchGNMT V2RNN75975 total tokens/sec1x V100DGX-119.07-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
RNN82766 total tokens/sec1x V100DGX-219.07-py3Mixed128WMT16 English-GermanV100-SXM3-32GB
RNN582605 total tokens/sec8x V100DGX-119.07-py3Mixed128WMT16 English-GermanV100-SXM2-32GB
TensorFlowGNMT V2RNN22471 total tokens/sec1x V100DGX-119.07-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
RNN26039 total tokens/sec1x V100DGX-2H19.07-py3Mixed192WMT16 English-GermanV100-SXM3-32GB-H
RNN149008 total tokens/sec8x V100DGX-119.07-py3Mixed192WMT16 English-GermanV100-SXM2-16GB
PyTorchNCFRecommender22093850 samples/sec1x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender24473776 samples/sec1x V100DGX-2H19.07-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
Recommender104122673 samples/sec8x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender109969915 samples/sec8x V100DGX-2H19.07-py3Mixed1048576MovieLens 20 MillionV100-SXM3-32GB-H
TensorFlowNCFRecommender26415693 samples/sec1x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-16GB
Recommender56183685 samples/sec8x V100DGX-119.07-py3Mixed1048576MovieLens 20 MillionV100-SXM2-32GB
TensorFlowBERT-LARGEAttention32 sentences/sec1x V100DGX-119.07-py3Mixed10SQuaD v1.1V100-SXM2-32GB
Attention37 sentences/sec1x V100DGX-2H19.07-py3Mixed10SQuaD v1.1V100-SXM3-32GB-H
Attention147 sentences/sec8xV100DGX-119.06-py3Mixed10SQuaD v1.1V100-SXM2-32GB
Attention189 sentences/sec8x V100DGX-2H19.07-py3Mixed10SQuaD v1.1V100-SXM3-32GB-H

T4 Training Performance

FrameworkNetworkNetwork TypeThroughput GPUServerContainerPrecisionBatch SizeDatasetGPU Version
MXNetInception V3CNN174 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1381 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN446 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed208ImageNet2012Tesla T4
CNN4116 images/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed208ImageNet2012Tesla T4
PyTorchInception V3CNN176 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1337 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
Mask R-CNNCNN6 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed4COCO2014Tesla T4
CNN38 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed4COCO2014Tesla T4
ResNet-50 v1.5CNN286 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
CNN2295 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
SSD v1.1CNN76 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed64COCO 2017Tesla T4
CNN622 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed64COCO 2017Tesla T4
Tacotron2CNN11236 total input tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed80LJ Speech 1.1Tesla T4
CNN78803 total input tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed80LJ Speech 1.1Tesla T4
WaveGlowCNN32167 output samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed8LJ Speech 1.1Tesla T4
CNN250173 output samples/sec8x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed8LJ Speech 1.1Tesla T4
TensorFlowInception V3CNN177 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
CNN1345 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128ImageNet2012Tesla T4
ResNet-50 v1.5CNN263 images/sec1x T4Supermicro SYS-4029GP-TRT T419.05-py3Mixed256ImageNet2012Tesla T4
CNN2057 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed256ImageNet2012Tesla T4
SSD v1.1CNN51 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed32COCO 2017Tesla T4
CNN281 images/sec8x T4Supermicro SYS-4029GP-TRT T419.06-py3Mixed32COCO 2017Tesla T4
U-Net IndustrialCNN28 images/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed16DAGM2007Tesla T4
CNN190 images/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed2DAGM2007Tesla T4
PyTorchGNMT V2RNN25288 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128WMT16 English-GermanTesla T4
RNN182072 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed128WMT16 English-GermanTesla T4
TensorFlowGNMT V2RNN9679 total tokens/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed192WMT16 English-GermanTesla T4
RNN57464 total tokens/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed192WMT16 English-GermanTesla T4
PyTorchNCFRecommender7584587 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
Recommender27011297 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
TensorFlowNCFRecommender10297010 samples/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
Recommender16872638 samples/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed1048576MovieLens 20 MillionTesla T4
TensorFlowBERTAttention9 sentences/sec1x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed3SQuaD v1.1Tesla T4
Attention30 sentences/sec8x T4Supermicro SYS-4029GP-TRT T419.07-py3Mixed3SQuaD v1.1Tesla T4

 

NVIDIA® TensorRT™ running on NVIDIA GPUs enable the most efficient deep learning inference performance across multiple application areas and models. This versatility provides wide latitude to data scientists to create the optimal low-latency solution. Visit NVIDIA GPU Cloud (NGC) to download any of these containers.

NVIDIA Tesla® V100 Tensor Cores GPUs leverage mixed-precision to combine high throughput with low latencies across every type of neural network. Tesla P4 is an inference GPU, designed for optimal power consumption and latency, for ultra-efficient scale-out servers. Read the inference whitepaper to learn more about NVIDIA’s inference platform.

Measuring the inference performance involves balancing a lot of variables. PLASTER is an acronym that describes the key elements for measuring deep learning performance. Each letter identifies a factor (Programmability, Latency, Accuracy, Size of Model, Throughput, Energy Efficiency, Rate of Learning) that must be considered to arrive at the right set of tradeoffs and to produce a successful deep learning implementation. Refer to NVIDIA’s PLASTER whitepaper for more details.

Inference Image Classification on CNNs with TensorRT

ResNet-50 v1.5 Throughput

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.07-py3 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.07-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 v1.5 Latency

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 1 | 19.07-py3 | Precision: INT8 | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 1 | 19.07-py3 | Precision: INT8 | Dataset: Synthetic

 
 

ResNet-50 v1.5 Power Efficiency

DGX-1: 1x Tesla V100-SXM2-16GB, E5-2698 v4 2.2 GHz | TensorRT 5.1 | Batch Size = 128 | 19.07-py3 | Precision: Mixed | Dataset: Synthetic
Supermicro SYS-4029GP-TRT T4: 1x Tesla T4, Gold 6140 2.3 GHz | TensorRT 5.1 | Batch Size = 128 | 19.07-py3 | Precision: INT8 | Dataset: Synthetic

 

Inference Performance

V100 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPUServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11602 images/sec15 images/sec/watt0.621x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN22162 images/sec18 images/sec/watt0.931x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN85159 images/sec35 images/sec/watt1.61x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN8211869 images/sec45 images/sec/watt6.91x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN12812400 images/sec44 images/sec/watt101x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
MobileNet V1CNN13623 images/sec29 images/sec/watt0.281x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN25594 images/sec45 images/sec/watt0.361x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN814532 images/sec89 images/sec/watt0.551x V100DGX-1-INT8SyntheticV100-SXM2-16GB
CNN12829567 images/sec101 images/sec/watt4.31x V100DGX-1-INT8SyntheticV100-SXM2-16GB
ResNet-50CNN11156 images/sec8.7 images/sec/watt0.871x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN21580 images/sec10 images/sec/watt1.31x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN83315 images/sec21 images/sec/watt2.41x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN1287595 images/sec27 images/sec/watt171x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN1287803 images/sec23 images/sec/watt161x V100DGX-219.07-py3MixedSyntheticV100-SXM3-32GB
ResNet-50v1.5CNN1943 images/sec7.1 images/sec/watt1.11x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN21396 images/sec9.7 images/sec/watt1.41x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN83226 images/sec19 images/sec/watt2.51x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN1287223 images/sec25 images/sec/watt181x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN1287,459 images/sec22 images/sec/watt171x V100DGX-219.07-py3MixedSyntheticV100-SXM2-32GB
VGG16CNN1821 images/sec4 images/sec/watt1.21x V100DGX-119.07-py3INT8SyntheticV100-SXM2-16GB
CNN2893 images/sec4.3 images/sec/watt2.21x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN81846 images/sec7.4 images/sec/watt4.31x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
CNN1282845 images/sec9.7 images/sec/watt451x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
NMTRNN14124 total tokens/sec tokens/sec/watt121x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN26680 total tokens/sec tokens/sec/watt151x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN6458591 total tokens/sec tokens/sec/watt561x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
RNN12873426 total tokens/sec tokens/sec/watt901x V100DGX-1-Mixedwmt16-English-GermanV100-SXM2-32GB
NCFRecommender104857656452298 samples/sec125923 samples/sec/watt1x V100DGX-119.07-py3MixedSyntheticV100-SXM2-16GB
BERT-LARGEAttention8105 sentences/sec0.76 sentences/sec/watt1x V100DGX-119.07-py3MixedSQuaD v1.1V100-SXM2-16GB

TensorRT 5.1 | TensorFlow for NCF and BERT-LARGE

 

T4 Inference Performance

NetworkNetwork
Type
Batch
Size
Throughput Efficiency LatencyGPUServerContainerPrecisionDatasetGPU
Version
GoogleNetCNN11681 images/sec25 images/sec/watt0.61x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN22319 images/sec35 images/sec/watt0.861x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN85723 images/sec83 images/sec/watt1.41x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN527580 images/sec109 images/sec/watt6.91x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN1287699 images/sec111 images/sec/watt171x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
MobileNet V1CNN13685 images/sec70 images/sec/watt0.271x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN26006 images/sec97 images/sec/watt0.331x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN813859 images/sec199 images/sec/watt0.581x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN12817438 images/sec251 images/sec/watt7.31x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
ResNet-50CNN11077 images/sec15 images/sec/watt0.931x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN21702 images/sec26 images/sec/watt1.21x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN83854 images/sec55 images/sec/watt2.11x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN1285288 images/sec76 images/sec/watt241x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
ResNet-50 v1.5CNN11037 images/sec15 images/sec/watt0.961x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN21710 images/sec26 images/sec/watt1.21x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN83730 images/sec53 images/sec/watt2.21x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN1285013 images/sec71 images/sec/watt261x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
VGG16CNN1726 images/sec10 images/sec/watt1.41x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN21064 images/sec15 images/sec/watt1.91x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN81670 images/sec24 images/sec/watt4.81x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
CNN1281956 images/sec28 images/sec/watt651x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
NCFRecommender17716 samples/sec281 samples/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
Recommender64491050 samples/sec16957 samples/sec/watt0.141x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
Recommender2500053908841 samples/sec782056 samples/sec/watt1.91x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
Recommender10000053908841 samples/sec782056 samples/sec/watt1.91x T4Supermicro SYS-4029GP-TRT T419.07-py3INT8SyntheticTesla T4
BERT-LARGEAttention842 sentences/sec0.80 sentences/sec/watt1x T4Supermicro SYS-4029GP-TRT T419.07-py3MixedSQuaD v1.1Tesla T4

TensorRT 5.1 | TensorFlow for BERT-LARGE

 

Last updated: August 19th, 2019