Jetson is used to deploy a wide range of popular DNN models and ML frameworks to the edge with high performance inferencing, for tasks like real-time classification and object detection, pose estimation, semantic segmentation, and natural language processing (NLP). The table below shows inferencing benchmarks for popular vision DNNs across the Jetson family with the lastest JetPack. These results can be reproduced by running the open jetson_benchmarks project from GitHub.
|Model||Jetson Nano||Jetson TX2||Jetson Xavier NX||Jetson AGX Xavier|
|FPS (limited latency)||FPS (max throughput)||FPS (limited latency)||FPS (max throughput)||FPS (limited latency)||FPS (max throughput)||FPS (limited latency)||FPS (max throughput)|
|Tiny YOLO V3
(seq length = 128)
|BERT requires Volta or newer||BERT requires Volta or newer||115||115||277||286|
(seq length = 128)
* Latency more than 15ms.
On Jetson Xavier NX and Jetson AGX Xavier, both NVIDIA Deep Learning Accelerator (NVDLA) engines and the GPU were run simultaneously with INT8 precision, while on Jetson Nano and Jetson TX2 the GPU was run with FP16 precision.
Each Jetson module was run with maximum performance
- MAX-N mode for Jetson AGX Xavier
- 15W for Xavier NX and TX2
- 10W for Nano
Minimum latency results
- The minimum latency throughput results were obtained with the maximum batch size that would not exceed 15ms latency (50ms for BERT) — otherwise, a batch size of one was used.
Maximum performance results
- The maximum throughput results were obtained without latency limitation and illustrate the maximum performance that can be achieved.
This methodology provides a balance between deterministic low-latency requirements for real-time applications and maximum performance for multi-stream use-case scenarios. All results are obtained with JetPack 4.4GA.