AI Training

Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.


Click here to view other performance data.


NVIDIA Performance on MLPerf 5.1 Training Benchmarks


NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Single Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NemoLlama2-70B-Lora60.925 Eval loss8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
8.50.925 Eval loss8x B300Nebius B300 n1 (8x B300-SXM-270GB)5.1-0008MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
90.925 Eval loss8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
8.90.925 Eval loss8x B2001xXE9680Lx8B200-SXM-180GB5.1-0030MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama3.1 8B67.43.3 log perplexity8x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0058Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
75.83.3 log perplexity8x B300Nebius B300 n1 (8x B300-SXM-275GB)5.1-0008Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
79.33.3 log perplexity8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0067Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
84.43.3 log perplexity8x B200SYS-422GS-NBRT-LCC5.1-0081Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchRetinaNet22.334.0% mAP8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0068MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
21.534.0% mAP8x B200AS-A126GS-TNBR5.1-0079MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200-SXM-180GB)
DGLR-GAT572.0 % classification8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0065MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
4.972.0 % classification8x B200AS-A126GS-TNBR5.1-0079MixedIGBH-FullNVIDIA Blackwell GPU (B200-SXM-180GB)
NVIDIA Merlin HugeCTRDLRM-dcnv22.20.80275 AUC8x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0066MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)
2.30.80275 AUC8x B200G894-AD15.1-0040MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (B200-SXM-180GB)

NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Multi Node, Closed Division

Framework Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf-ID Precision Dataset GPU Version
NVIDIA NeMoLlama 3.1 405B64.65.6 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
84.95.6 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
18.85.6 log perplexity2,560x GB200hsg (40x NVIDIA GB200 NVL72)5.1-0003Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
105.6 log perplexity5,120x GB200hsg (80x NVIDIA GB200 NVL72)5.1-0004Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
256.35.6 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
147.25.6 log perplexity448x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0089Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
Llama2-70B-Lora1.20.925 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
0.40.925 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedSCROLLS GovReportNVIDIA Blackwell Ultra GPU (GB300)
1.40.925 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0092MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
0.50.925 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedSCROLLS GovReportNVIDIA Blackwell GPU (GB200)
5.80.925 Eval loss16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
3.10.925 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
1.90.925 Eval loss128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedSCROLLS GovReportNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama 3.1 8B133.3 log perplexity72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
5.23.3 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060Mixedc4/en/3.0.1NVIDIA Blackwell Ultra GPU (GB300)
153.3 log perplexity72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
5.43.3 log perplexity512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071Mixedc4/en/3.0.1NVIDIA Blackwell GPU (GB200)
51.83.3 log perplexity16x B200Nebius B200 n2 (16x B200-SXM-180GB)5.1-0006Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
27.83.3 log perplexity32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
18.13.3 log perplexity64x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0090Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
103.3 log perplexity256x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0087Mixedc4/en/3.0.1NVIDIA Blackwell GPU (B200-SXM-180GB)
PyTorchFlux1146.30.586 Eval loss16x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0059MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
44.50.586 Eval loss72x GB300Theia (1x NVIDIA GB300 NVL72)5.1-0057MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
17.10.586 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)5.1-0060MixedCC12MNVIDIA Blackwell Ultra GPU (GB300)
160.70.586 Eval loss16x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0069MixedCC12MNVIDIA Blackwell GPU (GB200)
49.70.586 Eval loss72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0063MixedCC12MNVIDIA Blackwell GPU (GB200)
17.90.586 Eval loss512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0071MixedCC12MNVIDIA Blackwell GPU (GB200)
12.50.586 Eval loss1152x GB200hsg (18x NVIDIA GB200 NVL72)5.1-0002MixedCC12MNVIDIA Blackwell GPU (GB200)
173.40.586 Eval loss16x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0086MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
93.20.586 Eval loss32x B200Nebius B200 n4 (32x B200-SXM-180GB)5.1-0007MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
54.50.586 Eval loss72x GB200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0091MixedCC12MNVIDIA Blackwell GPU (B200-SXM-180GB)
RetinaNet3.834.0% mAP72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0064MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
1.434.0% mAP512x GB200Tyche (8x NVIDIA GB200 NVL72)5.1-0072MixedA subset of OpenImagesNVIDIA Blackwell GPU (GB200)
12.334.0% mAP16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
10.134.0% mAP128x B200HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB)5.1-0085MixedA subset of OpenImagesNVIDIA Blackwell GPU (B200)
DGLR-GAT1.172.0 % classification72x GB200Tyche (1x NVIDIA GB200 NVL72)5.1-0062MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
0.872.0 % classification256x GB200Tyche (4x NVIDIA GB200 NVL72)5.1-0070MixedIGBH-FullNVIDIA Blackwell GPU (GB200)
3.272.0 % classification16x B2002xXE9680Lx8B200-SXM-180GB5.1-0037MixedIGBH-FullNVIDIA Blackwell GPU (B200)
NVIDIA Merlin HugeCTRDLRM-dcnv20.70.80275 AUC64x GB200SRS-GB200-NVL72-M1 (16x ARS-121GL-NBO)5.0-0087MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Blackwell GPU (GB200)

MLPerf™ v5.1 Training Closed: 5.0-0087, 5.1-0002, 5.1-0003, 5.1-0004, 5.1-0006, 5.1-0007, 5.1-0008, 5.1-0030, 5.1-0037, 5.1-0040, 5.1-0057, 5.1-0058, 5.1-0059, 5.1-0060, 5.1-0062, 5.1-0063, 5.1-0064, 5.1-0065, 5.1-0066, 5.1-0067, 5.1-0068, 5.1-0069, 5.1-0070, 5.1-0071, 5.1-0072, 5.1-0079, 5.1-0081, 5.1-0085, 5.1-0086, 5.1-0087, 5.1-0089, 5.1-0090, 5.1-0091, 5.1-0092 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here


LLM Training Performance on NVIDIA Data Center Products


B200 Training Performance



Framework Model Time to Train (days) Throughput per GPU GPU Server Container Version Sequence Length TP PP CP Precision Global Batch Size GPU Version
NemoLlama3 8B0.430,131 tokens/sec8x B200DGX B200nemo:25.078192111FP8128NVIDIA B200
Llama3 70B3.13,690 tokens/sec64x B200DGX B200nemo:25.078192111FP8128NVIDIA B200
Llama3 405B16.8674 tokens/sec128x B200DGX B200nemo:25.078192482FP864NVIDIA B200
Llama4 Scout111,260 tokens/sec64x B200DGX B200nemo:25.078192121FP81024NVIDIA B200
Llama4 Maverick1.29,811 tokens/sec128x B200DGX B200nemo:25.078192121FP81024NVIDIA B200
Mixtral 8x7B0.716,384 tokens/sec64x B200DGX B200nemo:25.074096111FP8256NVIDIA B200
Mixtral 8x22B4.42,548 tokens/sec256x B200DGX B200nemo:25.0765536248FP864NVIDIA B200
Nemotron5-H56B2.74,123 tokens/sec64x B200DGX B200nemo:25.078192411FP8192NVIDIA B200
DeepSeek v36.91,640 tokens/sec256x B200DGX B200nemo:25.0740962161FP82048NVIDIA B200

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs



View More Performance Data

AI Inference

Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.

Learn More

AI Pipeline

NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.

Learn More