AI Training

Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.


Click here to view other performance data.


NVIDIA Performance on MLPerf 6.0 Training Benchmarks


NVIDIA Performance on MLPerf 6.0’s AI Benchmarks: Single Node, Closed Division

Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf ID Precision Dataset Framework GPU Version
Llama2-70B-Lora11.60.925 Eval loss4x GB3001xXE9712x4GB3006.0-0048MixedSCROLLS GovReportPyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama2-70B-Lora6.60.925 Eval loss8x B300XA_NB3I-E126.0-0038MixedSCROLLS GovReportPyTorchNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Llama2-70B-Lora7.90.925 Eval loss8x B200AS-4126GS-NBR-LCC6.0-0107MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama3.1 8B123.73.3 log perplexity4x GB3001xXE9712x4GB3006.0-0048Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama3.1 8B72.03.3 log perplexity8x B300Nebius B300 n1 (8x B300-SXM-270GB)6.0-0023Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Llama3.1 8B82.23.3 log perplexity8x B2001xXE9780x8B200-SXM-180GB6.0-0052Mixedc4/en/3.0.1DGLNVIDIA Blackwell GPU (B200-SXM-180GB)
GPT-OSS 20B152.73.34 log perplexity4x GB3001xXE9712x4GB3006.0-0048Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B83.63.34 log perplexity8x B300Nebius B300 n1 (8x B300-SXM-270GB)6.0-0023Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
GPT-OSS 20B96.53.34 log perplexity8x B200Lambda-1-Click-Cluster_B200_n16.0-0008Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell GPU (B200-SXM-180GB)
DLRM-dcnv22.20.80275 AUC8x B300G894-SD3-AAX76.0-0065MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Merlin HugeCTRNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
DLRM-dcnv22.30.80275 AUC8x B200SYS-A22GA-NBRT6.0-0113MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Merlin HugeCTRNVIDIA Blackwell GPU (B200-SXM-180GB)

NVIDIA Performance on MLPerf 6.0’s AI Benchmarks: Multi Node, Closed Division

Network Time to Train
(mins)
MLPerf Quality Target GPU Server MLPerf ID Precision Dataset Framework GPU Version
Llama 3.1 405B58.35.6 log perplexity512x GB300Theia-cmh (8x NVIDIA GB300 NVL72)6.0-0013Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 405B18.55.6 log perplexity2,048x GB300Theia-cmh (32x NVIDIA GB300 NVL72)6.0-0012Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 405B9.85.6 log perplexity4,096x GB300CoreWeave_GB300_1024x4_nccl22976.0-0004Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 405B18.85.6 log perplexity2,560x GB200Tyche-hsg (40x NVIDIA GB200 NVL72)6.0-0019Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 405B10.05.6 log perplexity5,120x GB200Tyche-hsg (80x NVIDIA GB200 NVL72)6.0-0021Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 405B7.15.6 log perplexity8,192x GB200Azure GB200 (128x NVIDIA GB200 NVL72)6.0-0001Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
DeepSeek v3 671B33.43.6 log perplexity256x GB300Theia (4x NVIDIA GB300 NVL72)6.0-0099Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
DeepSeek v3 671B17.53.6 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)6.0-0101Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
DeepSeek v3 671B5.53.6 log perplexity2,048x GB300CoreWeave_GB300_512x46.0-0006Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
DeepSeek v3 671B3.13.6 log perplexity4,096x GB300CoreWeave_GB300_1024x46.0-0003Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
DeepSeek v3 671B2.03.6 log perplexity8,192x GB300CoreWeave_GB300_2048x46.0-0005Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
DeepSeek v3 671B49.43.6 log perplexity256x GB200NVIDIA GB200 NVL72 (64 nodes, 4 NVLink domains)6.0-0007Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell GPU (GB200)
DeepSeek v3 671B27.63.6 log perplexity512x GB200Tyche-hsg (8x NVIDIA GB200 NVL72)6.0-0022Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
DeepSeek v3 671B7.83.6 log perplexity2,048x GB200Tyche-hsg (32x NVIDIA GB200 NVL72)6.0-0018Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
DeepSeek v3 671B4.83.6 log perplexity4,096x GB200Tyche-hsg (64x NVIDIA GB200 NVL72)6.0-0020Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
DeepSeek v3 671B3.33.6 log perplexity8,192x GB200Tyche-hsg (128x NVIDIA GB200 NVL72)6.0-0014Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama2-70B-Lora5.60.925 Eval loss8x GB300BM.GPU.GB300.46.0-0031MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama2-70B-Lora2.50.925 Eval loss32x GB300NVIDIA GB300 NVL72 by HPE6.0-0076MixedSCROLLS GovReportNVIDIA NeMo/PyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama2-70B-Lora1.30.925 Eval loss64x GB300D75U-1U_ngpu646.0-0104MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama2-70B-Lora0.40.925 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)6.0-0101MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama2-70B-Lora4.00.925 Eval loss16x B300Cisco UCS C880A-8xB300-SXM-288G6.0-0040MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Llama2-70B-Lora5.30.925 Eval loss16x GB200NVIDIA GB200 NVL72 by HPE6.0-0071MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama2-70B-Lora2.90.925 Eval loss32x GB200NVIDIA GB200 NVL72 by HPE6.0-0072MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama2-70B-Lora6.30.925 Eval loss16x B200HPE ProLiant Compute XD6856.0-0069MixedSCROLLS GovReportNVIDIA NeMoNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama3.1 8B63.53.3 log perplexity8x GB300NVIDIA GB300 NVL72 by HPE6.0-0078Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 8B33.43.3 log perplexity16x GB300NVIDIA GB300 NVL72 by HPE6.0-0075Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 8B20.23.3 log perplexity32x GB3008xXE9712x4GB3006.0-0060Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 8B11.63.3 log perplexity72x GB300Lambda_GB300_n186.0-0009Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 8B4.63.3 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)6.0-0101Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Llama 3.1 8B43.33.3 log perplexity16x B3002xXE9780x8B300-SXM-270GB6.0-0058Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Llama 3.1 8B14.43.3 log perplexity64x B300Nebius B300 n8 (64x B300-SXM-270GB)6.0-0025Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Llama3.1 8B79.73.3 log perplexity8x GB200NVIDIA GB200 NVL72 by HPE6.0-0074Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 8B49.03.3 log perplexity16x GB200NVIDIA GB200 NVL72 by HPE6.0-0071Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 8B39.03.3 log perplexity32x GB200NVIDIA GB200 NVL72 by HPE6.0-0072Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 8B4.53.3 log perplexity1,024x GB200Tyche-hsg (16x NVIDIA GB200 NVL72)6.0-0015Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
Llama 3.1 8B52.33.3 log perplexity16x B200HPE ProLiant Compute XD6856.0-0069Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (B200-SXM-180GB)
Llama 3.1 8B16.53.3 log perplexity64x B200CoreWeave_B200_8x86.0-0002Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell GPU (B200-SXM-180GB)
GPT-OSS 20B74.13.34 log perplexity8x GB300NVIDIA GB300 NVL72 by HPE6.0-0078Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B43.23.34 log perplexity16x GB300NVIDIA GB300 NVL72 by HPE6.0-0075Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B27.93.34 log perplexity32x GB300NVIDIA GB300 NVL72 by HPE6.0-0076Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B18.13.34 log perplexity72x GB300DLB2-CB36.0-0063Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B7.43.34 log perplexity512x GB300Theia (8x NVIDIA GB300 NVL72)6.0-0101Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
GPT-OSS 20B53.93.34 log perplexity16x B3002xXE9780x8B300-SXM-270GB6.0-0058Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
GPT-OSS 20B94.43.34 log perplexity8x GB200Tyche-hsg (1x NVIDIA GB200 NVL72)6.0-0017Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
GPT-OSS 20B19.23.34 log perplexity72x GB200Tyche-hsg (1x NVIDIA GB200 NVL72)6.0-0016Mixedc4/en/3.0.1NVIDIA NeMoNVIDIA Blackwell GPU (GB200)
GPT-OSS 20B27.03.34 log perplexity64x B200CoreWeave_B200_8x86.0-0002Mixedc4/en/3.0.1PyTorchNVIDIA Blackwell GPU (B200-SXM-180GB)
Flux1112.40.586 Eval loss16x GB3004xXE9712x4GB3006.0-0059MixedCC12MPyTorchNVIDIA Blackwell Ultra GPU (GB300)
Flux165.00.586 Eval loss32x GB300BM.GPU.GB300.46.0-0029MixedCC12MNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Flux136.50.586 Eval loss72x GB300DLB2-CB36.0-0063MixedCC12MNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Flux117.10.586 Eval loss512x GB300Theia (8x NVIDIA GB300 NVL72)6.0-0100MixedCC12MNVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
Flux177.50.586 Eval loss32x B300Nebius B300 n4 (32x B300-SXM-270GB)6.0-0024MixedCC12MNVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
Flux146.70.586 Eval loss64x B300Nebius B300 n8 (64x B300-SXM-270GB)6.0-0025MixedCC12MNVIDIA NeMoNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)
DLRM-dcnv22.10.80275 AUC8x GB300Theia (1x NVIDIA GB300 NVL72)6.0-0097MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA Merlin HugeCTRNVIDIA Blackwell Ultra GPU (GB300)
DLRM-dcnv20.70.80275 AUC64x GB300DLB2-CB36.0-0062MixedCriteo 3.5TB Click Logs (multi-hot variant)NVIDIA NeMoNVIDIA Blackwell Ultra GPU (GB300)
DLRM-dcnv21.90.80275 AUC16x B3002xXE9780x8B300-SXM-270GB6.0-0058MixedCriteo 3.5TB Click Logs (multi-hot variant)PyTorchNVIDIA Blackwell Ultra GPU (B300-SXM-270GB)

MLPerf™ v6.0 Training Closed: 6.0-0003, 6.0-0005, 6.0-0010, 6.0-0013, 6.0-0017, 6.0-0023, 6.0-0024, 6.0-0025, 6.0-0027, 6.0-0028, 6.0-0030, 6.0-0035, 6.0-0037, 6.0-0038, 6.0-0039, 6.0-0041, 6.0-0042, 6.0-0043, 6.0-0065, 6.0-0066, 6.0-0067, 6.0-0070, 6.0-0073, 6.0-0079, 6.0-0083, 6.0-0084, 6.0-0085, 6.0-0086, 6.0-0087, 6.0-0088, 6.0-0089, 6.0-0090, 6.0-0091, 6.0-0094, 6.0-0095, 6.0-0096, 6.0-0097, 6.0-0098, 6.0-0100, 6.0-0101, 6.0-0102, 6.0-0103, 6.0-0104, 6.0-0105, 6.0-0106, 6.0-0107, 6.0-0113, 6.0-0117 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here


LLM Training Performance on NVIDIA Data Center Products


GB300 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v36,422 tokens/sec/gpu256x GB300DGX GB300-409612132FP815360NVIDIA GB300
GPT-OSS 120B19,275 tokens/sec/gpu64x GB300DGX GB300nemo:26.04409611164BF161280NVIDIA GB300
Qwen3 30B a3B31,470 tokens/sec/gpu8x GB300DGX GB300nemo:26.0440961118FP8512NVIDIA GB300
Qwen3 235B a22B6,994 tokens/sec/gpu256x GB300DGX GB300nemo:26.04409614132FP84096NVIDIA GB300
Nemotron 3 Nano38,102 tokens/sec/gpu8x GB300DGX GB300nemo:26.0481921118FP8512NVIDIA GB300
Nemotron 3 Super9,623 tokens/sec/gpu64x GB300DGX GB300nemo:26.04819211164FP4512NVIDIA GB300
Kimi K25,332 tokens/sec/gpu256x GB300DGX GB300nemo:26.04409614164FP84096NVIDIA GB300

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

B300 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v33,131 tokens/sec/gpu256x B300DGX B300nemo:26.04409614164FP84096NVIDIA B300
GPT-OSS 120B15,114 tokens/sec/gpu64x B300DGX B300nemo:26.0440961118BF161280NVIDIA B300
Qwen3 235B a22B4,865 tokens/sec/gpu256x B300DGX B300nemo:26.0440961818FP88192NVIDIA B300
Nemotron3 Super7,047 tokens/sec/gpu64x B300DGX B300nemo:26.0481921118FP4512NVIDIA B300

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

B200 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoDeepSeek v32,815 tokens/sec/gpu256x B200DGX B200nemo:26.04409611618FP84096NVIDIA B200
GPT-OSS 120B13,045 tokens/sec/gpu64x B200DGX B200nemo:26.0440961118BF164096NVIDIA B200
Qwen3 30B a3B26,859 tokens/sec/gpu8x B200DGX B200nemo:26.0440961118FP8512NVIDIA B200

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism

H100 Training Performance


Framework Model Throughput per GPU GPU Server Container Version Sequence Length TP PP CP EP Precision Global Batch Size GPU Version
NVIDIA NemoGPT-OSS 120B5,810 tokens/sec/gpu64x H100DGX H100nemo:26.0440961418BF161280H100-SXM5-80GB
Qwen3 30B a3B8,901 tokens/sec/gpu16x H100DGX H100nemo:26.04409611116FP81024H100-SXM5-80GB
Qwen3 235B a22B1,686 tokens/sec/gpu256x H100DGX H100nemo:26.04409628132FP88192H100-SXM5-80GB
Nemotron3 Nano14,507 tokens/sec/gpu16x H100DGX H100nemo:26.0481921118FP81024H100-SXM5-80GB

TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
EP: Expert Parallelism



View More Performance Data

AI Inference

Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.

Learn More

AI Pipeline

NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.

Learn More