This page provides initial benchmarking results of deep learning inference performance and energy efficiency for Jetson AGX Xavier on networks including ResNet-18 FCN, ResNet-50, VGG19, GoogleNet, and AlexNet using JetPack 4.1.1 Developer Preview software. Performance and power characteristics will continue to improve over time as NVIDIA releases software updates containing additional features and optimizations for Jetson AGX Xavier.

ResNet-18 FCN (Fully Convolutional Network) for semantic segmentation operates at full HD resolution (2048x1024) and is representative of autonomous machine workloads involved with perception, path planning, and navigation. ResNet-50, VGG19, GoogleNet, and AlexNet perform recognition and classification on image patches with 224x224 resolution, and are commonly used as the encoder backbones of various object detection and segmentation networks. Using a batch size of 8 or higher at the lower resolution can be used to approximate the performance and latency of a batch size of 1 at higher resolutions. Robotic platforms and autonomous machines often incorporate multiple cameras and sensors which can be batch processed for increased performance, in addition to performing detection of regions-of-interest (ROIs) followed by further classification of the ROIs in batches.

With the recent availability of substantial computational resources at the edge, applications are deploying increasingly complex networks, such as variants of ResNet and VGG, for improved accuracy. Here we provide GoogleNet and AlexNet for historical completeness (denoted in the tables in gray), and in the future we will be updating this document to include additional networks for tasks like object detection, motion planning, and those that incorporate Recurrent Neural Networks (RNNs) for processes such as speech recognition and image captioning.

The benchmark measurements below were collected with the following environment:

Total module power consumption and energy efficiency measurements include the usage of CPU, GPU, DLAs, memory, miscellaneous SoC power, I/O, and regulator efficiency losses on all rails. Power consumption was measured using INA voltage and current monitors onboard the module.

Estimates of future performance, incorporating software enhancements such as INT8 support for the DLA’s and additional GPU optimizations, are provided for various network configurations. The future performance estimates presume concurrent use of GPU (INT8) and two DLAs (INT8).

15W Mode (HD)

NETWORK BATCH SIZE PERF (img/sec) LATENCY (ms) MODULE POWER (watts) MODULE PERFORMANCE / watt
ResNet-18 FCN 1 34 29.2 12.3 2.8
ResNet-18 FCN 2 36 55.4 12.5 2.9
ResNet-18 FCN 4 41 96.6 12.7 3.3
ResNet-18 FCN 8 48 167.4 12.8 3.7
ResNet-18 FCN 16 49 323.9 13.0 3.8
ResNet-18 FCN 32 51 622.6 13.2 3.9

MAX-N Mode* (HD)

NETWORK BATCH SIZE PERF (img/sec) LATENCY (ms) MODULE POWER (watts) MODULE PERFORMANCE / watt
ResNet-18 FCN 1 64 15.6 35.4 1.8
ResNet-18 FCN 2 68 29.6 36.0 1.9
ResNet-18 FCN 4 74 54.1 36.9 2.0
ResNet-18 FCN 8 82 97.6 37.3 2.2
ResNet-18 FCN 16 86 186.0 38.3 2.2
ResNet-18 FCN 32 88 363.6 38.7 2.3

15W Mode

NETWORK BATCH SIZE PERF (img/sec) LATENCY (ms) MODULE POWER (watts) MODULE PERFORMANCE / watt FUTURE PERF (img/sec) FUTURE MODULE POWER (watts) FUTURE MODULE PERFORMANCE / watt
ResNet-50 1 358 2.8 11.5 31.2 800 12 67
ResNet-50 2 508 3.9 12.8 39.7 1090 14 78
ResNet-50 4 634 6.3 13.6 46.5 1280 14 91
ResNet-50 8 717 11.2 14.4 49.8 1360 14 97
ResNet-50 16 767 20.9 14.9 51.3 1410 15 94
ResNet-50 32 841 38.0 15.1 55.7 1430 15 95
ResNet-50 64 869 73.6 15.1 57.6 1430 15 95
ResNet-50 128 879 145.7 15.2 57.7 1430 15 95
VGG19 1 84 11.9 14.2 5.9 230 12 19
VGG19 2 132 15.2 14.4 9.1 290 13 22
VGG19 4 174 22.9 14.6 11.9 320 13 25
VGG19 8 191 41.8 14.9 12.8 340 13 26
VGG19 16 231 69.4 15.0 15.3 350 13 27
VGG19 32 260 123.1 15.2 17.1 350 13 27
VGG19 64 269 238.0 15.3 17.6 350 13 27
VGG19 128 274 467.8 15.4 17.8 350 13 27
GoogleNet 1 542 1.8 9.8 55.0 1310 11 119
GoogleNet 2 684 2.9 10.4 65.8 1670 13 128
GoogleNet 4 890 4.5 11.4 78.1 1920 15 128
GoogleNet 8 1015 7.9 12.0 84.4 1940 15 129
GoogleNet 16 1121 14.3 12.8 87.6 1950 15 130
GoogleNet 32 1184 27.0 13.2 90.0 1980 15 132
GoogleNet 64 1235 51.8 13.2 93.6 1980 15 132
GoogleNet 128 1255 102.0 13.3 94.3 1980 15 132
AlexNet 1 299 3.3 14.0 21.3 1090 12 91
AlexNet 2 466 4.3 14.3 32.6 1790 12 149
AlexNet 4 721 5.5 14.9 48.5 2650 13 204
AlexNet 8 990 8.1 13.5 73.4 3510 13 270
AlexNet 16 1291 12.4 14.2 90.8 4200 14 300
AlexNet 32 1713 18.7 14.4 119.0 4670 14 334
AlexNet 64 2087 30.7 14.8 141.3 4670 14 334
AlexNet 128 2270 56.4 14.9 152.5 4670 14 334

MAX-N Mode*

NETWORK BATCH SIZE PERF (img/sec) LATENCY (ms) MODULE POWER (watts) MODULE PERFORMANCE / watt FUTURE PERF (img/sec) FUTURE MODULE POWER (watts) FUTURE MODULE PERFORMANCE / watt
ResNet-50 1 656 1.5 31 21.2 1390 29 48
ResNet-50 2 915 2.2 34.2 26.7 1970 34 58
ResNet-50 4 1143 3.5 37.2 30.7 2320 37 63
ResNet-50 8 1293 6.2 39.3 32.9 2490 38 66
ResNet-50 16 1388 11.5 40.7 34.1 2620 39 67
ResNet-50 32 1561 20.5 41.6 37.5 2710 40 68
ResNet-50 64 1612 39.7 42.1 38.3 2710 40 68
ResNet-50 128 1631 78.5 42.4 38.5 2710 40 68
VGG19 1 156 6.4 38.2 4.1 420 32 13
VGG19 2 236 8.5 40.6 5.8 550 35 16
VGG19 4 316 12.7 41 7.7 620 36 17
VGG19 8 375 21.4 43.9 8.5 660 37 18
VGG19 16 451 35.5 44.7 10.1 680 37 18
VGG19 32 502 63.7 45.2 11.1 680 37 18
VGG19 64 521 122.9 45.7 11.4 680 37 18
VGG19 128 531 241.1 45.9 11.6 680 37 18
GoogleNet 1 1030 1 27.1 38 2290 29 79
GoogleNet 2 1310 1.5 29.8 43.9 2980 34 88
GoogleNet 4 1644 2.4 31.3 52.4 3560 39 91
GoogleNet 8 1897 4.2 36 52.7 3980 42 95
GoogleNet 16 2085 7.7 36 57.9 4250 45 94
GoogleNet 32 2234 14.3 38 58.9 4410 46 96
GoogleNet 64 2373 27 41.5 57.2 4410 46 96
GoogleNet 128 2414 53 41.7 57.9 4410 46 96
AlexNet 1 483 2.1 33.9 14.3 1810 27 67
AlexNet 2 774 2.6 34.9 22.2 3060 29 106
AlexNet 4 1231 3.2 37 33.3 4660 32 146
AlexNet 8 1734 4.6 40.6 42.7 6390 36 178
AlexNet 16 2535 6.3 39.7 63.8 7840 38 206
AlexNet 32 3338 9.6 41.1 81.2 8860 41 216
AlexNet 64 4129 15.5 41.7 99 8860 41 216
AlexNet 128 4504 28.4 42.4 106.2 8860 41 216

15W Mode (Per-Rail)

The following provides a breakdown of the individual power rails, with each rail including regulator efficiency losses, for a subset of ResNet-50:

NETWORK BATCH SIZE PERF (img/sec) SoC POWER (watts) CPU POWER (watts) GPU POWER (watts) DLA POWER (watts) DRAM POWER (watts) 5V I/O POWER (watts) MODULE POWER (watts)
ResNet-50 2 508 2.2 1.1 3.5 1.4 1.9 2.7 12.8
ResNet-50 8 717 2.3 1.1 4.6 1.4 2.2 2.8 14.4

Notes

* Module power in MAX-N mode with the JetPack 4.1.1 Developer Preview release may exceed TDP for some configurations. Users should tune the power profile and configuration to stay within the TDP for their application. Future versions of JetPack will further optimize performance and power.

** Example trtexec launch commands:

For GPU

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --int8 --batch=8 --iterations=10000 --output=prob --useSpinWait

For DLA (Core 0)

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=0 --useSpinWait --allowGPUFallback

For DLA (Core 1)

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=1 --useSpinWait --allowGPUFallback

Multiple instances of trtexec can be launched simultaneously in this fashion for concurrent execution of the GPU and DLA’s. DLA supports a maximum batch size of 32 depending on the network, while the GPU can run higher batch sizes concurrently.

Last updated: Nov 20, 2018 | NVIDIA Corporation | Subject to Change