AI Training

Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.


Click here to view other performance data.


NVIDIA Performance on MLPerf 5.1 Training Benchmarks


NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Single Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NemoLlama2-70B-Lora60.925 Eval loss8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
8.50.925 Eval loss8x B300Nebius B300 n1 (8x B300-SXM-270GB)5.1-0008MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
90.925 Eval loss8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
8.90.925 Eval loss8x B2001xXE9680Lx8B200-SXM-180GB5.1-0030MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama3.1 8B67.43.3 log perplexity8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
75.83.3 log perplexity8x B300Nebius B300 n1 (8x B300-SXM-275GB)5.1-0008Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
79.33.3 log perplexity8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
84.43.3 log perplexity8x B200SYS-422GS-NBRT-LCC5.1-0081Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchRetinaNet22.334.0% mAP8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0068MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
21.534.0% mAP8x B200AS-A126GS-TNBR5.1-0079MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200-SXM-180GB)
DGLR-GAT572.0 % classification8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0065MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
4.972.0 % classification8x B200AS-A126GS-TNBR5.1-0079MixedIGBH-FullNVIDIA Blackwell GPU (B200-SXM-180GB)
NVIDIA Merlin HugeCTRDLRM-dcnv22.20.80275 AUC8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0066MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)
2.30.80275 AUC8x B200G894-AD15.1-0040MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (B200-SXM-180GB)

NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Multi Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NVIDIA NeMoLlama 3.1 405B64.65.6 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
84.95.6 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
18.85.6 log perplexity2,560x GB200hsg (40x NVIDIA GB200 NVL72)5.1-0003Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
105.6 log perplexity5,120x GB200hsg (80x NVIDIA GB200 NVL72)5.1-0004Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
256.35.6 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
147.25.6 log perplexity448x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0089Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
Llama2-70B-Lora1.20.925 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
0.40.925 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
1.40.925 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0092MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
0.50.925 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
5.80.925 Eval loss16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
3.10.925 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
1.90.925 Eval loss128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama 3.1 8B133.3 log perplexity72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
5.23.3 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
153.3 log perplexity72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
5.43.3 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
51.83.3 log perplexity16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
27.83.3 log perplexity32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
18.13.3 log perplexity64x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0090Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
103.3 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchFlux1146.30.586 Eval loss16x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0059MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
44.50.586 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
17.10.586 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
160.70.586 Eval loss16x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0069MixedCC12MNVIDIA Blackwell GPU (GB200)
49.70.586 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063MixedCC12MNVIDIA Blackwell GPU (GB200)
17.90.586 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedCC12MNVIDIA Blackwell GPU (GB200)
12.50.586 Eval loss1152x GB200hsg (18x NVIDIA GB200 NVL72)5.1-0002MixedCC12MNVIDIA Blackwell GPU (GB200)
173.40.586 Eval loss16x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0086MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
93.20.586 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
54.50.586 Eval loss72x GB200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0091MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
RetinaNet3.834.0% mAP72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0064MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
1.434.0% mAP512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0072MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
12.334.0% mAP16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
10.134.0% mAP128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
DGLR-GAT1.172.0 % classification72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0062MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
0.872.0 % classification256x GB200Tyche (4x NVIDIA GB200 NVL72)5.1-0070MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
3.272.0 % classification16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedIGBH-FullNVIDIA Blackwell GPU (B200)
NVIDIA Merlin HugeCTRDLRM-dcnv20.70.80275 AUC64x GB200SRS-GB200-NVL72-M1 (16x ARS-121GL-NBO)5.0-0087MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)

MLPerf™ v5.1 Training Closed: 5.0-0087, 5.1-0002, 5.1-0003, 5.1-0004, 5.1-0006, 5.1-0007, 5.1-0008, 5.1-0030, 5.1-0037, 5.1-0040, 5.1-0057, 5.1-0058, 5.1-0059, 5.1-0060, 5.1-0062, 5.1-0063, 5.1-0064, 5.1-0065, 5.1-0066, 5.1-0067, 5.1-0068, 5.1-0069, 5.1-0070, 5.1-0071, 5.1-0072, 5.1-0079, 5.1-0081, 5.1-0085, 5.1-0086, 5.1-0087, 5.1-0089, 5.1-0090, 5.1-0091, 5.1-0092 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here


LLM Training Performance on NVIDIA Data Center Products


GB300 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v34,965 tokens/sec/gpu256x GB300DGX GB300nemo:26.04409612132FP84096NVIDIA GB300
GPT-OSS 120B19,275 tokens/sec/gpu64x GB300DGX GB300nemo:26.04409611164BF161280NVIDIA GB300
Qwen3 30B a3B31,470 tokens/sec/gpu8x GB300DGX GB300nemo:26.0440961118FP8512NVIDIA GB300
Qwen3 235B a22B6,994 tokens/sec/gpu256x GB300DGX GB300nemo:26.04409614132FP84096NVIDIA GB300
Nemotron 3 Nano38,102 tokens/sec/gpu8x GB300DGX GB300nemo:26.0481921118FP8512NVIDIA GB300
Nemotron 3 Super9,623 tokens/sec/gpu64x GB300DGX GB300nemo:26.04819211164FP4512NVIDIA GB300
Kimi K25,332 tokens/sec/gpu256x GB300DGX GB300nemo:26.04409614164FP84096NVIDIA GB300

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

B300 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v33,131 tokens/sec/gpu256x B300DGX B300nemo:26.04409614164FP84096NVIDIA B300
GPT-OSS 120B15,114 tokens/sec/gpu64x B300DGX B300nemo:26.0440961118BF161280NVIDIA B300
Qwen3 235B a22B4,865 tokens/sec/gpu256x B300DGX B300nemo:26.0440961818FP88192NVIDIA B300
Nemotron3 Super7,047 tokens/sec/gpu64x B300DGX B300nemo:26.0481921118FP4512NVIDIA B300

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

B200 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v32,815 tokens/sec/gpu256x B200DGX B200nemo:26.04409611618FP84096NVIDIA B200
GPT-OSS 120B13,045 tokens/sec/gpu64x B200DGX B200nemo:26.0440961118BF164096NVIDIA B200
Qwen3 30B a3B26,859 tokens/sec/gpu8x B200DGX B200nemo:26.0440961118FP8512NVIDIA B200

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

H100 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoGPT-OSS 120B5,810 tokens/sec/gpu64x H100DGX H100nemo:26.0440961418BF161280H100-SXM5-80GB
Qwen3 30B a3B8,901 tokens/sec/gpu16x H100DGX H100nemo:26.04409611116FP81024H100-SXM5-80GB
Qwen3 235B a22B1,686 tokens/sec/gpu256x H100DGX H100nemo:26.04409628132FP88192H100-SXM5-80GB
Nemotron3 Nano14,507 tokens/sec/gpu16x H100DGX H100nemo:26.0481921118FP81024H100-SXM5-80GB

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism



View More Performance Data

AI Inference

Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.

Learn More

AI Pipeline

NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.

Learn More