For the latest Jetson benchmarks go to https://developer.nvidia.com/embedded/jetson-benchmarks

This page provides initial benchmarking results of deep learning inference performance and energy efficiency for Jetson AGX Xavier on networks including ResNet-18 FCN, ResNet-50, VGG19, GoogleNet, and AlexNet using JetPack 4.1.1 Developer Preview software. Performance and power characteristics will continue to improve over time as NVIDIA releases software updates containing additional features and optimizations for Jetson AGX Xavier.

ResNet-18 FCN (Fully Convolutional Network) for semantic segmentation operates at full HD resolution (2048x1024) and is representative of autonomous machine workloads involved with perception, path planning, and navigation. ResNet-50, VGG19, GoogleNet, and AlexNet perform recognition and classification on image patches with 224x224 resolution, and are commonly used as the encoder backbones of various object detection and segmentation networks. Using a batch size of 8 or higher at the lower resolution can be used to approximate the performance and latency of a batch size of 1 at higher resolutions. Robotic platforms and autonomous machines often incorporate multiple cameras and sensors which can be batch processed for increased performance, in addition to performing detection of regions-of-interest (ROIs) followed by further classification of the ROIs in batches.

With the recent availability of substantial computational resources at the edge, applications are deploying increasingly complex networks, such as variants of ResNet and VGG, for improved accuracy. Here we provide GoogleNet and AlexNet for historical completeness (denoted in the tables in gray), and in the future we will be updating this document to include additional networks for tasks like object detection, motion planning, and those that incorporate Recurrent Neural Networks (RNNs) for processes such as speech recognition and image captioning.

The benchmark measurements below were collected with the following environment:

Total module power consumption and energy efficiency measurements include the usage of CPU, GPU, DLAs, memory, miscellaneous SoC power, I/O, and regulator efficiency losses on all rails. Power consumption was measured using INA voltage and current monitors onboard the module.

Estimates of future performance, incorporating software enhancements such as INT8 support for the DLA’s and additional GPU optimizations, are provided for various network configurations. The future performance estimates presume concurrent use of GPU (INT8) and two DLAs (INT8).

15W Mode (HD)

NETWORKBATCH SIZEPERF (img/sec)LATENCY (ms)MODULE POWER (watts)MODULE PERFORMANCE / watt
ResNet-18 FCN13429.212.32.8
ResNet-18 FCN23655.412.52.9
ResNet-18 FCN44196.612.73.3
ResNet-18 FCN848167.412.83.7
ResNet-18 FCN1649323.913.03.8
ResNet-18 FCN3251622.613.23.9

MAX-N Mode* (HD)

NETWORKBATCH SIZEPERF (img/sec)LATENCY (ms)MODULE POWER (watts)MODULE PERFORMANCE / watt
ResNet-18 FCN16415.635.41.8
ResNet-18 FCN26829.636.01.9
ResNet-18 FCN47454.136.92.0
ResNet-18 FCN88297.637.32.2
ResNet-18 FCN1686186.038.32.2
ResNet-18 FCN3288363.638.72.3

15W Mode

NETWORKBATCH SIZEPERF (img/sec)LATENCY (ms)MODULE POWER (watts)MODULE PERFORMANCE / wattFUTURE PERF (img/sec)FUTURE MODULE POWER (watts)FUTURE MODULE PERFORMANCE / watt
ResNet-5013582.811.531.28001267
ResNet-5025083.912.839.710901478
ResNet-5046346.313.646.512801491
ResNet-50871711.214.449.813601497
ResNet-501676720.914.951.314101594
ResNet-503284138.015.155.714301595
ResNet-506486973.615.157.614301595
ResNet-50128879145.715.257.714301595
VGG1918411.914.25.92301219
VGG19213215.214.49.12901322
VGG19417422.914.611.93201325
VGG19819141.814.912.83401326
VGG191623169.415.015.33501327
VGG1932260123.115.217.13501327
VGG1964269238.015.317.63501327
VGG19128274467.815.417.83501327
GoogleNet15421.89.855.0131011119
GoogleNet26842.910.465.8167013128
GoogleNet48904.511.478.1192015128
GoogleNet810157.912.084.4194015129
GoogleNet16112114.312.887.6195015130
GoogleNet32118427.013.290.0198015132
GoogleNet64123551.813.293.6198015132
GoogleNet1281255102.013.394.3198015132
AlexNet12993.314.021.310901291
AlexNet24664.314.332.6179012149
AlexNet47215.514.948.5265013204
AlexNet89908.113.573.4351013270
AlexNet16129112.414.290.8420014300
AlexNet32171318.714.4119.0467014334
AlexNet64208730.714.8141.3467014334
AlexNet128227056.414.9152.5467014334

MAX-N Mode*

NETWORKBATCH SIZEPERF (img/sec)LATENCY (ms)MODULE POWER (watts)MODULE PERFORMANCE / wattFUTURE PERF (img/sec)FUTURE MODULE POWER (watts)FUTURE MODULE PERFORMANCE / watt
ResNet-5016561.53121.213902948
ResNet-5029152.234.226.719703458
ResNet-50411433.537.230.723203763
ResNet-50812936.239.332.924903866
ResNet-5016138811.540.734.126203967
ResNet-5032156120.541.637.527104068
ResNet-5064161239.742.138.327104068
ResNet-50128163178.542.438.527104068
VGG1911566.438.24.14203213
VGG1922368.540.65.85503516
VGG19431612.7417.76203617
VGG19837521.443.98.56603718
VGG191645135.544.710.16803718
VGG193250263.745.211.16803718
VGG1964521122.945.711.46803718
VGG19128531241.145.911.66803718
GoogleNet11030127.13822902979
GoogleNet213101.529.843.929803488
GoogleNet416442.431.352.435603991
GoogleNet818974.23652.739804295
GoogleNet1620857.73657.942504594
GoogleNet32223414.33858.944104696
GoogleNet6423732741.557.244104696
GoogleNet12824145341.757.944104696
AlexNet14832.133.914.318102767
AlexNet27742.634.922.2306029106
AlexNet412313.23733.3466032146
AlexNet817344.640.642.7639036178
AlexNet1625356.339.763.8784038206
AlexNet3233389.641.181.2886041216
AlexNet64412915.541.799886041216
AlexNet128450428.442.4106.2886041216

15W Mode (Per-Rail)

The following provides a breakdown of the individual power rails, with each rail including regulator efficiency losses, for a subset of ResNet-50:

NETWORKBATCH SIZEPERF (img/sec)SoC POWER (watts)CPU POWER (watts)GPU POWER (watts)DLA POWER (watts)DRAM POWER (watts)5V I/O POWER (watts)MODULE POWER (watts)
ResNet-5025082.21.13.51.41.92.712.8
ResNet-5087172.31.14.61.42.22.814.4

Notes

* Module power in MAX-N mode with the JetPack 4.1.1 Developer Preview release may exceed TDP for some configurations. Users should tune the power profile and configuration to stay within the TDP for their application. Future versions of JetPack will further optimize performance and power.

** Example trtexec launch commands:

For GPU

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --int8 --batch=8 --iterations=10000 --output=prob --useSpinWait

For DLA (Core 0)

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=0 --useSpinWait --allowGPUFallback

For DLA (Core 1)

$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=1 --useSpinWait --allowGPUFallback

Multiple instances of trtexec can be launched simultaneously in this fashion for concurrent execution of the GPU and DLA’s. DLA supports a maximum batch size of 32 depending on the network, while the GPU can run higher batch sizes concurrently.

Last updated: Nov 20, 2018 | NVIDIA Corporation | Subject to Change