Jetson Benchmarks

Jetson is used to deploy a wide range of popular DNN models, optimized transformer models and ML frameworks to the edge with high performance inferencing, for tasks like real-time classification and object detection, pose estimation, semantic segmentation, and natural language processing (NLP).

MLPerf Inference Benchmarks

The tables below show inferencing benchmarks from the NVIDIA Jetson submissions to the MLPerf Inference Edge category.


Jetson AGX Orin MLPerf v4.0 Results


Model NVIDIA Jetson AGX Orin (TensorRT)
Single Stream Latency (ms) Offline (Samples/s)
LLM Summarization
GPT-J 6B
10204.46 0.15
Image Generation
stable-diffusion-xl
12941.92 0.08


Jetson AGX Orin and Jetson Orin NX MLPerf v3.1 Results


Model NVIDIA Jetson AGX Orin (TensorRT) NVIDIA Orin MaxQ (TensorRT) NVIDIA Jetson Orin NX NVIDIA Jetson Orin NX MaxQ
Single Stream Latency (ms) Offline (Samples/s) Multi Stream Latency(ms) Offline (Samples/s) System Power(W) Offline (Samples/s) Offline (Samples/s) System Power(W)
Image Classification
ResNet
0.64 6423.63 2.18 3526.29 23.57 2640.51 1681.87 14.95
Object Detection
Retinanet
11.67 148.71 82.92 74.71 22.27 66.5 47.59 15.57
Medical Imaging
3D-Unet-99.0
4371.46 0.51 N/A N/A N/A 0.2 0.19 22.04
Speech-to-text
RNN-T
94.01 1169.98 N/A N/A N/A 431.92 327.79 17.25
Natural Language Processing
BERT
5.71 553.69 N/A N/A N/A 194.5 136.59 17.04


Jetson AGX Orin Jetson Orin NX MLPerf v3.0 Results


Model NVIDIA Jetson AGX Orin (TensorRT) NVIDIA Orin MaxQ (TensorRT) NVIDIA Jetson Orin NX
Single Stream (Samples/s) Offline (Samples/s) Multi Stream (Samples/s) Offline (Samples/s) System Power(W) Offline (Samples/s)
Image Classification
ResNet-50
1538 6438.10 3686 3525.91 23.06 2517.99
Object Detection
Retinanet
51.57 92.40 60.00 34.6 22.4 36.14
Medical Imaging
3D-Unet
.26 .51 N/A 3.28 28.64 .19
Speech-to-text
RNN-T
9.822 1170.23 N/A 14472 25.64 405.27
Natural Language Processing
BERT
144.36 544.24 N/A 3685.36 25.91 163.57

  • Steps to reproduce these results can be found at v3.0 Results | MLCommons
  • These results were achieved with the NVIDIA Jetson AGX Orin Developer Kit running a preview of TensorRT 8.5.0, and CUDA 11.4
  • Note different configurations were used for single stream, offline and multistream. Reference the MLCommons page for more details

Gen AI Benchmarks

NVIDIA Jetson AI Lab is a collection of tutorials showing how to run optimized models on NVIDIA Jetson, including the latest generative AI and transformer models. These tutorials span a variety of model modalities like LLMs (for text), VLMs (for text and vision data), ViT (Vision Transformers), image generation, and ASR or TTS (for audio).

Large Language Models (LLM)

Small Language Models (SLM)

Vision Transformers (ViT)

Riva