Jetson is used to deploy a wide range of popular DNN models and ML frameworks to the edge with high performance inferencing, for tasks like real-time classification and object detection, pose estimation, semantic segmentation, and natural language processing (NLP). The table below shows inferencing benchmarks for popular vision DNNs across the Jetson family with the latest JetPack. These results can be reproduced by running the open jetson_benchmarks project from GitHub.

Model Jetson Nano Jetson TX2 series Jetson Xavier NX Jetson AGX Xavier
FPS (limited latency)FPS (max throughput)FPS (limited latency)FPS (max throughput)FPS (limited latency)FPS (max throughput)FPS (limited latency)FPS (max throughput)
Inception V4
(299x299)
11*1324*32320405528704
VGG-19
(224x224)
10*1223*2967*313276432
Super Resolution
(481x321)
15*1533*33164166281302
Unet
(256x256)
17*1739*39166166240251
OpenPose
(256x456)
15*1534*35238271439484
Tiny YOLO V3
(416x416)
48*4910711260761811001127
ResNet-50
(224x224)
37*4784112824110019462109
SSD Mobilenet-V1
(300x300)
43*4892109909105816021919
SSSD Resnet34
(1200x1200)
113229295555
BERT_BASE
(seq length = 128)
BERT requires Volta or newerBERT requires Volta or newer115115277286
BERT_LARGE
(seq length = 128)
32358690

* Latency more than 15ms.


On Jetson Xavier NX and Jetson AGX Xavier, both NVIDIA Deep Learning Accelerator (NVDLA) engines and the GPU were run simultaneously with INT8 precision, while on Jetson Nano and Jetson TX2 the GPU was run with FP16 precision.

Notes:

  • Each Jetson module was run with maximum performance
    • MAX-N mode for Jetson AGX Xavier
    • 15W for Jetson Xavier NX and Jetson TX2
    • 10W for Jetson Nano
  • Minimum latency results
    • The minimum latency throughput results were obtained with the maximum batch size that would not exceed 15ms latency (50ms for BERT) — otherwise, a batch size of one was used.
  • Maximum performance results
    • The maximum throughput results were obtained without latency limitation and illustrate the maximum performance that can be achieved.

This methodology provides a balance between deterministic low-latency requirements for real-time applications and maximum performance for multi-stream use-case scenarios. All results are obtained with JetPack 4.4.1.