AI Training
Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.
Click here to view other performance data.
NVIDIA Performance on MLPerf 5.1 Training Benchmarks
NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Single Node, Closed Division
| Framework | Network | Time to Train (mins) |
MLPerf Quality Target | GPU | Server | MLPerf-ID | Precision | Dataset | GPU Version |
|---|---|---|---|---|---|---|---|---|---|
| Nemo | Llama2-70B-Lora | 6 | 0.925 Eval loss | 8x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0058 | Mixed | SCROLLS GovReport | NVIDIA Blackwell Ultra GPU (GB300) |
| 8.5 | 0.925 Eval loss | 8x B300 | Nebius B300 n1 (8x B300-SXM-270GB) | 5.1-0008 | Mixed | SCROLLS GovReport | NVIDIA Blackwell Ultra GPU (B300-SXM-270GB) | ||
| 9 | 0.925 Eval loss | 8x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0067 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (GB200) | ||
| 8.9 | 0.925 Eval loss | 8x B200 | 1xXE9680Lx8B200-SXM-180GB | 5.1-0030 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| Llama3.1 8B | 67.4 | 3.3 log perplexity | 8x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0058 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell Ultra GPU (GB300) | |
| 75.8 | 3.3 log perplexity | 8x B300 | Nebius B300 n1 (8x B300-SXM-275GB) | 5.1-0008 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell Ultra GPU (B300-SXM-270GB) | ||
| 79.3 | 3.3 log perplexity | 8x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0067 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 84.4 | 3.3 log perplexity | 8x B200 | SYS-422GS-NBRT-LCC | 5.1-0081 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| PyTorch | RetinaNet | 22.3 | 34.0% mAP | 8x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0068 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (GB200) |
| 21.5 | 34.0% mAP | 8x B200 | AS-A126GS-TNBR | 5.1-0079 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| DGL | R-GAT | 5 | 72.0 % classification | 8x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0065 | Mixed | IGBH-Full | NVIDIA Blackwell GPU (GB200) |
| 4.9 | 72.0 % classification | 8x B200 | AS-A126GS-TNBR | 5.1-0079 | Mixed | IGBH-Full | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| NVIDIA Merlin HugeCTR | DLRM-dcnv2 | 2.2 | 0.80275 AUC | 8x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0066 | Mixed | Criteo 3.5TB Click Logs (multi-hot variant) | NVIDIA Blackwell GPU (GB200) |
| 2.3 | 0.80275 AUC | 8x B200 | G894-AD1 | 5.1-0040 | Mixed | Criteo 3.5TB Click Logs (multi-hot variant) | NVIDIA Blackwell GPU (B200-SXM-180GB) |
NVIDIA Performance on MLPerf 5.1’s AI Benchmarks: Multi Node, Closed Division
| Framework | Network | Time to Train (mins) |
MLPerf Quality Target | GPU | Server | MLPerf-ID | Precision | Dataset | GPU Version |
|---|---|---|---|---|---|---|---|---|---|
| NVIDIA NeMo | Llama 3.1 405B | 64.6 | 5.6 log perplexity | 512x GB300 | Theia (8x NVIDIA GB300 NVL72) | 5.1-0060 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell Ultra GPU (GB300) |
| 84.9 | 5.6 log perplexity | 512x GB200 | Tyche (8x NVIDIA GB200 NVL72) | 5.1-0071 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 18.8 | 5.6 log perplexity | 2,560x GB200 | hsg (40x NVIDIA GB200 NVL72) | 5.1-0003 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 10 | 5.6 log perplexity | 5,120x GB200 | hsg (80x NVIDIA GB200 NVL72) | 5.1-0004 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 256.3 | 5.6 log perplexity | 256x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0087 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 147.2 | 5.6 log perplexity | 448x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0089 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| Llama2-70B-Lora | 1.2 | 0.925 Eval loss | 72x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0057 | Mixed | SCROLLS GovReport | NVIDIA Blackwell Ultra GPU (GB300) | |
| 0.4 | 0.925 Eval loss | 512x GB300 | Theia (8x NVIDIA GB300 NVL72) | 5.1-0060 | Mixed | SCROLLS GovReport | NVIDIA Blackwell Ultra GPU (GB300) | ||
| 1.4 | 0.925 Eval loss | 72x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0092 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (GB200) | ||
| 0.5 | 0.925 Eval loss | 512x GB200 | Tyche (8x NVIDIA GB200 NVL72) | 5.1-0071 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (GB200) | ||
| 5.8 | 0.925 Eval loss | 16x B200 | Nebius B200 n2 (16x B200-SXM-180GB) | 5.1-0006 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 3.1 | 0.925 Eval loss | 32x B200 | Nebius B200 n4 (32x B200-SXM-180GB) | 5.1-0007 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 1.9 | 0.925 Eval loss | 128x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0085 | Mixed | SCROLLS GovReport | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| Llama 3.1 8B | 13 | 3.3 log perplexity | 72x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0057 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell Ultra GPU (GB300) | |
| 5.2 | 3.3 log perplexity | 512x GB300 | Theia (8x NVIDIA GB300 NVL72) | 5.1-0060 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell Ultra GPU (GB300) | ||
| 15 | 3.3 log perplexity | 72x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0063 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 5.4 | 3.3 log perplexity | 512x GB200 | Tyche (8x NVIDIA GB200 NVL72) | 5.1-0071 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (GB200) | ||
| 51.8 | 3.3 log perplexity | 16x B200 | Nebius B200 n2 (16x B200-SXM-180GB) | 5.1-0006 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 27.8 | 3.3 log perplexity | 32x B200 | Nebius B200 n4 (32x B200-SXM-180GB) | 5.1-0007 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 18.1 | 3.3 log perplexity | 64x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0090 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 10 | 3.3 log perplexity | 256x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0087 | Mixed | c4/en/3.0.1 | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| PyTorch | Flux1 | 146.3 | 0.586 Eval loss | 16x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0059 | Mixed | CC12M | NVIDIA Blackwell Ultra GPU (GB300) |
| 44.5 | 0.586 Eval loss | 72x GB300 | Theia (1x NVIDIA GB300 NVL72) | 5.1-0057 | Mixed | CC12M | NVIDIA Blackwell Ultra GPU (GB300) | ||
| 17.1 | 0.586 Eval loss | 512x GB300 | Theia (8x NVIDIA GB300 NVL72) | 5.1-0060 | Mixed | CC12M | NVIDIA Blackwell Ultra GPU (GB300) | ||
| 160.7 | 0.586 Eval loss | 16x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0069 | Mixed | CC12M | NVIDIA Blackwell GPU (GB200) | ||
| 49.7 | 0.586 Eval loss | 72x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0063 | Mixed | CC12M | NVIDIA Blackwell GPU (GB200) | ||
| 17.9 | 0.586 Eval loss | 512x GB200 | Tyche (8x NVIDIA GB200 NVL72) | 5.1-0071 | Mixed | CC12M | NVIDIA Blackwell GPU (GB200) | ||
| 12.5 | 0.586 Eval loss | 1152x GB200 | hsg (18x NVIDIA GB200 NVL72) | 5.1-0002 | Mixed | CC12M | NVIDIA Blackwell GPU (GB200) | ||
| 173.4 | 0.586 Eval loss | 16x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0086 | Mixed | CC12M | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 93.2 | 0.586 Eval loss | 32x B200 | Nebius B200 n4 (32x B200-SXM-180GB) | 5.1-0007 | Mixed | CC12M | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| 54.5 | 0.586 Eval loss | 72x GB200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0091 | Mixed | CC12M | NVIDIA Blackwell GPU (B200-SXM-180GB) | ||
| RetinaNet | 3.8 | 34.0% mAP | 72x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0064 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (GB200) | |
| 1.4 | 34.0% mAP | 512x GB200 | Tyche (8x NVIDIA GB200 NVL72) | 5.1-0072 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (GB200) | ||
| 12.3 | 34.0% mAP | 16x B200 | 2xXE9680Lx8B200-SXM-180GB | 5.1-0037 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (B200) | ||
| 10.1 | 34.0% mAP | 128x B200 | HiPerGator NVIDIA DGX B200 (8x B200‑SXM‑180GB) | 5.1-0085 | Mixed | A subset of OpenImages | NVIDIA Blackwell GPU (B200) | ||
| DGL | R-GAT | 1.1 | 72.0 % classification | 72x GB200 | Tyche (1x NVIDIA GB200 NVL72) | 5.1-0062 | Mixed | IGBH-Full | NVIDIA Blackwell GPU (GB200) |
| 0.8 | 72.0 % classification | 256x GB200 | Tyche (4x NVIDIA GB200 NVL72) | 5.1-0070 | Mixed | IGBH-Full | NVIDIA Blackwell GPU (GB200) | ||
| 3.2 | 72.0 % classification | 16x B200 | 2xXE9680Lx8B200-SXM-180GB | 5.1-0037 | Mixed | IGBH-Full | NVIDIA Blackwell GPU (B200) | ||
| NVIDIA Merlin HugeCTR | DLRM-dcnv2 | 0.7 | 0.80275 AUC | 64x GB200 | SRS-GB200-NVL72-M1 (16x ARS-121GL-NBO) | 5.0-0087 | Mixed | Criteo 3.5TB Click Logs (multi-hot variant) | NVIDIA Blackwell GPU (GB200) |
MLPerf™ v5.1 Training Closed: 5.0-0087, 5.1-0002, 5.1-0003, 5.1-0004, 5.1-0006, 5.1-0007, 5.1-0008, 5.1-0030, 5.1-0037, 5.1-0040, 5.1-0057, 5.1-0058, 5.1-0059, 5.1-0060, 5.1-0062, 5.1-0063, 5.1-0064, 5.1-0065, 5.1-0066, 5.1-0067, 5.1-0068, 5.1-0069, 5.1-0070, 5.1-0071, 5.1-0072, 5.1-0079, 5.1-0081, 5.1-0085, 5.1-0086, 5.1-0087, 5.1-0089, 5.1-0090, 5.1-0091, 5.1-0092 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here
LLM Training Performance on NVIDIA Data Center Products
B200 Training Performance
| Framework | Model | Time to Train (days) | Throughput per GPU | GPU | Server | Container Version | Sequence Length | TP | PP | CP | Precision | Global Batch Size | GPU Version |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nemo | Llama3 8B | 0.4 | 30,131 tokens/sec | 8x B200 | DGX B200 | nemo:25.07 | 8192 | 1 | 1 | 1 | FP8 | 128 | NVIDIA B200 |
| Llama3 70B | 3.1 | 3,690 tokens/sec | 64x B200 | DGX B200 | nemo:25.07 | 8192 | 1 | 1 | 1 | FP8 | 128 | NVIDIA B200 | |
| Llama3 405B | 16.8 | 674 tokens/sec | 128x B200 | DGX B200 | nemo:25.07 | 8192 | 4 | 8 | 2 | FP8 | 64 | NVIDIA B200 | |
| Llama4 Scout | 1 | 11,260 tokens/sec | 64x B200 | DGX B200 | nemo:25.07 | 8192 | 1 | 2 | 1 | FP8 | 1024 | NVIDIA B200 | |
| Llama4 Maverick | 1.2 | 9,811 tokens/sec | 128x B200 | DGX B200 | nemo:25.07 | 8192 | 1 | 2 | 1 | FP8 | 1024 | NVIDIA B200 | |
| Mixtral 8x7B | 0.7 | 16,384 tokens/sec | 64x B200 | DGX B200 | nemo:25.07 | 4096 | 1 | 1 | 1 | FP8 | 256 | NVIDIA B200 | |
| Mixtral 8x22B | 4.4 | 2,548 tokens/sec | 256x B200 | DGX B200 | nemo:25.07 | 65536 | 2 | 4 | 8 | FP8 | 64 | NVIDIA B200 | |
| Nemotron5-H56B | 2.7 | 4,123 tokens/sec | 64x B200 | DGX B200 | nemo:25.07 | 8192 | 4 | 1 | 1 | FP8 | 192 | NVIDIA B200 | |
| DeepSeek v3 | 6.9 | 1,640 tokens/sec | 256x B200 | DGX B200 | nemo:25.07 | 4096 | 2 | 16 | 1 | FP8 | 2048 | NVIDIA B200 |
TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs
View More Performance Data
AI Inference
Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.
Learn MoreAI Pipeline
NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.
Learn More