Jetson is used to deploy a wide range of popular DNN models and ML frameworks to the edge with high performance inferencing, for tasks like real-time classification and object detection, pose estimation, semantic segmentation, and natural language processing (NLP). The table below shows inferencing benchmarks for popular vision DNNs across the Jetson family with the lastest JetPack. These results can be reproduced by running the open jetson_benchmarks project from GitHub.

Model Jetson Nano Jetson TX2 Jetson Xavier NX Jetson AGX Xavier
FPS (limited latency) FPS (max throughput) FPS (limited latency) FPS (max throughput) FPS (limited latency) FPS (max throughput) FPS (limited latency) FPS (max throughput)
Inception V4
(299x299)
11* 13 24* 32 320 405 528 704
VGG-19
(224x224)
10* 12 23* 29 67* 313 276 432
Super Resolution
(481x321)
15* 15 33* 33 164 166 281 302
Unet
(256x256)
17* 17 39* 39 166 166 240 251
OpenPose
(256x456)
15* 15 34* 35 238 271 439 484
Tiny YOLO V3
(416x416)
48* 49 107 112 607 618 1100 1127
ResNet-50
(224x224)
37* 47 84 112 824 1100 1946 2109
SSD Mobilenet-V1
(300x300)
43* 48 92 109 909 1058 1602 1919
BERT_BASE
(seq length = 128)
BERT requires Volta or newer BERT requires Volta or newer 115 115 277 286
BERT_LARGE
(seq length = 128)
32 35 86 90

* Latency more than 15ms.


On Jetson Xavier NX and Jetson AGX Xavier, both NVIDIA Deep Learning Accelerator (NVDLA) engines and the GPU were run simultaneously with INT8 precision, while on Jetson Nano and Jetson TX2 the GPU was run with FP16 precision.

Notes:

  • Each Jetson module was run with maximum performance
    • MAX-N mode for Jetson AGX Xavier
    • 15W for Xavier NX and TX2
    • 10W for Nano
  • Minimum latency results
    • The minimum latency throughput results were obtained with the maximum batch size that would not exceed 15ms latency (50ms for BERT) — otherwise, a batch size of one was used.
  • Maximum performance results
    • The maximum throughput results were obtained without latency limitation and illustrate the maximum performance that can be achieved.

This methodology provides a balance between deterministic low-latency requirements for real-time applications and maximum performance for multi-stream use-case scenarios. All results are obtained with JetPack 4.4GA.