AI Training
Deploying AI in real-world applications requires training networks to convergence at a specified accuracy. This is the best methodology to test whether AI systems are ready to be deployed in the field to deliver meaningful results.
Click here to view other performance data.
NVIDIA Performance on MLPerf 4.0 Training Benchmarks
NVIDIA Performance on MLPerf 4.0’s AI Benchmarks: Single Node, Closed Division
Framework | Network | Time to Train (mins) |
MLPerf Quality Target | GPU | Server | MLPerf-ID | Precision | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|
NVIDIA Nemo | LLama2-70B-lora | 24.7 | 0.925 cross entropy loss | 8x H200 | NVIDIA H200 | 4.0-0071 | Mixed | SCROLLs GovReport | H200-SXM5-141GB |
28.2 | 0.925 cross entropy loss | 8x H100 | Eos | 4.0-0050 | Mixed | SCROLLs GovReport | H100-SXM5-80GB | ||
DGL | R-GAT | 7.7 | 72.0 % classification | 8x H200 | NVIDIA H200 | 4.0-0068 | Mixed | IGBH-Full | H200-SXM5-141GB |
11.3 | 72.0 % classification | 8x H100 | Eos | 4.0-0047 | Mixed | IGBH-Full | H100-SXM5-80GB | ||
NVIDIA Merlin HugeCTR | DLRM-dcnv2 | 3.5 | 0.80275 AUC | 8x H200 | NVIDIA H200 | 4.0-0070 | Mixed | Criteo 4TB | H200-SXM5-141GB |
3.9 | 0.80275 AUC | 8x H100 | Eos | 4.0-0049 | Mixed | Criteo 4TB | H100-SXM5-80GB | ||
NVIDIA Nemo | Stable Diffusion v2.0 | 41.3 | FID⇐90 and CLIP>=0.15 | 8x H200 | NVIDIA H200 | 4.0-0071 | Mixed | LAION-400M-filtered | H200-SXM5-141GB |
42.2 | FID⇐90 and CLIP>=0.15 | 8x H100 | Eos | 4.0-0050 | Mixed | LAION-400M-filtered | H100-SXM5-80GB | ||
PyTorch | BERT | 5.2 | 0.72 Mask-LM accuracy | 8x H200 | NVIDIA H200 | 4.0-0072 | Mixed | Wikipedia 2020/01/01 | H200-SXM5-141GB |
5.5 | 0.72 Mask-LM accuracy | 8x H100 | Eos | 4.0-0052 | Mixed | Wikipedia 2020/01/01 | H100-SXM5-80GB | ||
PyTorch | RetinaNet | 34.3 | 34.0% mAP | 8x H200 | NVIDIA H200 | 4.0-0073 | Mixed | Subset of OpenImages | H200-SXM5-141GB |
35.5 | 34.0% mAP | 8x H100 | Eos | 4.0-0051 | Mixed | Subset of OpenImages | H100-SXM5-80GB | ||
MXNet | 3D U-Net | 11.5 | 0.908 Mean DICE score | 8x H200 | NVIDIA H200 | 4.0-0069 | Mixed | KiTS19 | H200-SXM5-141GB |
12.1 | 0.908 Mean DICE score | 8x H100 | Eos | 4.0-0048 | Mixed | KiTS19 | H100-SXM5-80GB | ||
MXNet | ResNet-50 v1.5 | 12.1 | 75.90% classification | 8x H200 | NVIDIA H200 | 4.0-0069 | Mixed | ImageNet | H200-SXM5-141GB |
13.3 | 75.90% classification | 8x H100 | Eos | 4.0-0048 | Mixed | ImageNet | H100-SXM5-80GB |
NVIDIA Performance on MLPerf 4.0’s AI Benchmarks: Multi Node, Closed Division
Framework | Network | Time to Train (mins) |
MLPerf Quality Target | GPU | Server | MLPerf-ID | Precision | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|
NVIDIA NeMo | GPT3 | 50.7 | 2.69 log perplexity | 512x H100 | Eos_n64 | 4.0-0059 | Mixed | c4/en/3.0.1 | H100-SXM5-80GB |
3.7 | 2.69 log perplexity | 10,752x H100 | Eos-dfw_n1344 | 4.0-0006 | Mixed | c4/en/3.0.1 | H100-SXM5-80GB | ||
3.4 | 2.69 log perplexity | 11,616x H100 | Eos-dfw_n1452 | 4.0-0007 | Mixed | c4/en/3.0.1 | H100-SXM5-80GB | ||
NVIDIA NeMo | LLama2-70B-lora | 5.3 | 0.925 cross entropy loss | 64x H100 | Eos_n8 | 4.0-0063 | Mixed | SCROLLs GovReport | H100-SXM5-80GB |
1.5 | 0.925 cross entropy loss | 1,024x H100 | Eos_n128 | 4.0-0053 | Mixed | SCROLLs GovReport | H100-SXM5-80GB | ||
DGL | R-GAT | 2.7 | 72.0 % classification | 64x H100 | Eos_n8 | 4.0-0060 | Mixed | IGBH-Full | H100-SXM5-80GB |
1.1 | 72.0 % classification | 512x H100 | Eos_n64 | 4.0-0058 | Mixed | IGBH-Full | H100-SXM5-80GB | ||
NVIDIA Merlin HugeCTR | DLRM-dcnv2 | 1.4 | 0.80275 AUC | 64x H100 | Eos_n8 | 4.0-0062 | Mixed | Criteo 4TB | H100-SXM5-80GB |
1.0 | 0.80275 AUC | 128x H100 | Eos_n16 | 4.0-0054 | Mixed | Criteo 4TB | H100-SXM5-80GB | ||
NVIDIA NeMo | Stable Diffusion v2.0 | 6.7 | FID⇐90 and CLIP>=0.15 | 64x H100 | Eos_n8 | 4.0-0063 | Mixed | LAION-400M-filtered | H100-SXM5-80GB |
1.8 | FID⇐90 and CLIP>=0.15 | 512x H100 | Eos_n64 | 4.0-0059 | Mixed | LAION-400M-filtered | H100-SXM5-80GB | ||
1.4 | FID⇐90 and CLIP>=0.15 | 1,024x H100 | Eos_n128 | 4.0-0053 | Mixed | LAION-400M-filtered | H100-SXM5-80GB | ||
PyTorch | BERT | 0.9 | 0.72 Mask-LM accuracy | 64x H100 | Eos_n8 | 4.0-0064 | Mixed | Wikipedia 2020/01/01 | H100-SXM5-80GB |
0.1 | 0.72 Mask-LM accuracy | 3,472x H100 | Eos_n434 | 4.0-0057 | Mixed | Wikipedia 2020/01/01 | H100-SXM5-80GB | ||
PyTorch | RetinaNet | 6.1 | 34.0% mAP | 64x H100 | Eos_n8 | 4.0-0065 | Mixed | Subset of OpenImages | H100-SXM5-80GB |
0.8 | 34.0% mAP | 2,528x H100 | Eos_n316 | 4.0-0056 | Mixed | Subset of OpenImages | H100-SXM5-80GB | ||
MXNet | 3D U-Net | 1.9 | 0.908 Mean DICE score | 72x H100 | Eos_n9 | 4.0-0066 | Mixed | KiTS19 | H100-SXM5-80GB |
0.8 | 0.908 Mean DICE score | 768x H100 | Eos_n96 | 4.0-0067 | Mixed | KiTS19 | H100-SXM5-80GB | ||
MXNet | ResNet-50 v1.5 | 2.5 | 75.90% classification | 64x H100 | Eos_n8 | 4.0-0061 | Mixed | ImageNet | H100-SXM5-80GB |
0.2 | 75.90% classification | 3,584x H100 | NVIDIA+CoreWeave Joint Submission | 4.0-0008 | Mixed | ImageNet | H100-SXM5-80GB |
MLPerf™ v4.0 Training Closed: 4.0-0006, 4.0-0007, 4.0-0008, 4.0-0047, 4.0-0048, 4.0-0049, 4.0-0050, 4.0-0051, 4.0-0052, 4.0-0053, 4.0-0054, 4.0-0055, 4.0-0056, 4.0-0057, 4.0-0058, 4.0-0059, 4.0-0060, 4.0-0061, 4.0-0062, 4.0-0063, 4.0-0064, 4.0-0065, 4.0-0066, 4.0-0067, 4.0-0068, 4.0-0069, 4.0-0070, 4.0-0071, 4.0-0072, 4.0-0073 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For Training rules and guidelines, click here
NVIDIA Performance on MLPerf 3.0’s Training HPC Benchmarks: Closed Division
Framework | Network | Time to Train (mins) |
MLPerf Quality Target | GPU | Server | MLPerf-ID | Precision | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|
PyTorch | CosmoFlow | 2.1 | Mean average error 0.124 | 512x H100 | eos | 3.0-8006 | Mixed | CosmoFlow N-body cosmological simulation data with 4 cosmological parameter targets | H100-SXM5-80GB |
DeepCAM | 0.8 | IOU 0.82 | 2,048x H100 | eos | 3.0-8007 | Mixed | CAM5+TECA climate simulation with 3 target classes (atmospheric river, tropical cyclone, background) | H100-SXM5-80GB | |
OpenCatalyst | 10.7 | Forces mean absolute error 0.036 | 640x H100 | eos | 3.0-8008 | Mixed | Open Catalyst 2020 (OC20) S2EF 2M training split, ID validation set | H100-SXM5-80GB | |
OpenFold | 7.5 | Local Distance Difference Test (lDDT-Cα) >= 0.8 | 2,080x H100 | eos | 3.0-8009 | Mixed | OpenProteinSet and Protein Data Bank | H100-SXM5-80GB |
MLPerf™ v3.0 Training HPC Closed: 3.0-8006, 3.0-8007, 3.0-8008, 3.0-8009 | MLPerf name and logo are trademarks. See https://mlcommons.org/ for more information.
For MLPerf™ v3.0 Training HPC rules and guidelines, click here
LLM Training Performance on NVIDIA Data Center Products
H100 Training Performance
Framework | Model | Time to Train (days) | Throughput per GPU | GPU | Server | Container Version | Sequence Length | TP | PP | CP | Precision | Global Batch Size | GPU Version |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Nemo | GPT3 5B | 0.5 | 23,117 tokens/sec | 64x H100 | Eos | nemo:24.05 | 2,048 | 1 | 1 | 1 | FP8 | 2,048 | H100 SXM5 80GB |
GPT3 20B | 2 | 5,611 tokens/sec | 64x H100 | Eos | nemo:24.05 | 2,048 | 2 | 1 | 1 | FP8 | 256 | H100 SXM5 80GB | |
Llama2 7B | 0.7 | 16,154 tokens/sec | 8x H100 | Eos | nemo:24.05 | 4,096 | 1 | 1 | 1 | FP8 | 128 | H100 SXM5 80GB | |
Llama2 13B | 1.4 | 8,344 tokens/sec | 16x H100 | Eos | nemo:24.05 | 4,096 | 1 | 4 | 1 | FP8 | 128 | H100 SXM5 80GB | |
Llama2 70B | 6.8 | 1,659 tokens/sec | 64x H100 | Eos | nemo:24.05 | 4,096 | 4 | 4 | 1 | FP8 | 128 | H100 SXM5 80GB | |
Llama3 8B | 1 | 11,879 tokens/sec | 8x H100 | Eos | nemo:24.05 | 8,192 | 1 | 1 | 2 | FP8 | 128 | H100 SXM5 80GB | |
Llama3 70B | 7.8 | 1,444 tokens/sec | 64x H100 | Eos | nemo:24.05 | 8,192 | 4 | 4 | 2 | FP8 | 128 | H100 SXM5 80GB |
TP: Tensor Parallelism
PP: Pipeline Parallelism
CP: Context Parallelism
Time to Train is estimated time to train on 1T tokens with 1K GPUs
Converged Training Performance on NVIDIA Data Center GPUs
H200 Training Performance
Framework | Framework Version | Network | Time to Train (mins) |
Accuracy | Throughput | GPU | Server | Container | Precision | Batch Size | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PyTorch | 2.4.0a0 | Tacotron2 | 62 | .54 Training Loss | 514,893 total output mels/sec | 8x H200 | DGX H200 | 24.06-py3 | TF32 | 128 | LJSpeech 1.1 | NVIDIA H200 |
2.4.0a0 | WaveGlow | 110 | -5.7 Training Loss | 3,974,080 output samples/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 10 | LJSpeech 1.1 | NVIDIA H200 | |
2.4.0a0 | GNMT v2 | 9 | 24.26 BLEU Score | 1,870,930 total tokens/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 128 | wmt16-en-de | NVIDIA H200 | |
2.4.0a0 | NCF | .96 Hit Rate at 10 | 244,942,025 samples/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 131072 | MovieLens 20M | NVIDIA H200 | ||
2.4.0a0 | FastPitch | 72 | .17 Training Loss | 1,350,880 frames/sec | 8x H200 | DGX H200 | 24.06-py3 | TF32 | 32 | LJSpeech 1.1 | NVIDIA H200 | |
2.4.0a0 | Transformer XL Large | 277 | 17.87 Perplexity | 301,765 total tokens/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 16 | WikiText-103 | NVIDIA H200 | |
2.4.0a0 | Transformer XL Base | 122 | 21.58 Perplexity | 1,100,513 total tokens/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 128 | WikiText-103 | NVIDIA H200 | |
2.4.0a0 | EfficientNet-B4 | 101 | 82. Top 1 | 6,030 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 128 | Imagenet2012 | NVIDIA H200 | |
2.4.0a0 | EfficientDet-D0 | 307 | .33 BBOX mAP | 2,755 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 150 | COCO 2017 | NVIDIA H200 | |
2.4.0a0 | EfficientNet-WideSE-B4 | 1,451 | 82.21 Top 1 | 6,024 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 128 | Imagenet2012 | NVIDIA H200 | |
2.4.0a0 | TFT-Electricity | 2 | .03 Test P90 | 158,366 items/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 1024 | Electricity | NVIDIA H200 | |
2.4.0a0 | HiFiGAN | 911 | 9.22 Training Loss | 119,911 total output mels/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 16 | LJSpeech-1.1 | NVIDIA H200 | |
2.4.0a0 | GPUNet-0 | 1,054 | 78.86 Top 1 | 9,934 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 192 | Imagenet2012 | NVIDIA H200 | |
2.4.0a0 | GPUNet-1 | 963 | 80.33 Top 1 | 10,905 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 192 | Imagenet2012 | NVIDIA H200 | |
Tensorflow | 2.16.1 | U-Net Medical | 2 | .89 DICE Score | 2,356 images/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 8 | EM segmentation challenge | NVIDIA H200 |
2.16.1 | Wide and Deep | 4 | .66 MAP at 12 | 11,859,112 samples/sec | 8x H200 | DGX H200 | 24.06-py3 | Mixed | 16384 | Kaggle Outbrain Click Prediction | NVIDIA H200 |
H100 Training Performance
Framework | Framework Version | Network | Time to Train (mins) |
Accuracy | Throughput | GPU | Server | Container | Precision | Batch Size | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PyTorch | 2.4.0a0 | Tacotron2 | 67 | .56 Training Loss | 473,451 total output mels/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 128 | LJSpeech 1.1 | H100-SXM5-80GB |
2.4.0a0 | WaveGlow | 116 | -5.73 Training Loss | 3,738,190 output samples/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 10 | LJSpeech 1.1 | H100-SXM5-80GB | |
2.4.0a0 | GNMT v2 | 9 | 24.11 BLEU Score | 1,710,731 total tokens/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 128 | wmt16-en-de | H100-SXM5-80GB | |
2.4.0a0 | NCF | .96 Hit Rate at 10 | 219,720,903 samples/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 131072 | MovieLens 20M | H100-SXM5-80GB | ||
2.4.0a0 | FastPitch | 72 | .17 Training Loss | 1,364,338 frames/sec | 8x H100 | DGX H100 | 24.06-py3 | TF32 | 32 | LJSpeech 1.1 | H100-SXM5-80GB | |
2.4.0a0 | Transformer XL Large | 318 | 17.83 Perplexity | 261,789 total tokens/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 16 | WikiText-103 | H100-SXM5-80GB | |
2.4.0a0 | Transformer XL Base | 140 | 21.58 Perplexity | 957,832 total tokens/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 128 | WikiText-103 | H100-SXM5-80GB | |
2.4.0a0 | EfficientNet-B4 | 1,658 | 81.92 Top 1 | 5,251 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 128 | Imagenet2012 | H100-SXM5-80GB | |
2.4.0a0 | EfficientDet-D0 | 317 | .33 BBOX mAP | 2,630 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 150 | COCO 2017 | H100-SXM5-80GB | |
2.4.0a0 | EfficientNet-WideSE-B4 | 1,668 | 82.28 Top 1 | 5,223 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 128 | Imagenet2012 | H100-SXM5-80GB | |
2.4.0a0 | HiFiGAN | 944 | 9.67 Training Loss | 116,668 total output mels/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 16 | LJSpeech-1.1 | H100 SXM5-80GB | |
2.4.0a0 | GPUNet-0 | 1,062 | 78.69 Top 1 | 9,856 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 192 | Imagenet2012 | H100-SXM5-80GB | |
2.4.0a0 | GPUNet-1 | 956 | 80.29 Top 1 | 10,981 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 192 | Imagenet2012 | H100-SXM5-80GB | |
2.4.0a0 | MoFlow | 35 | 86.9 NUV | 45,008 molecules/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 512 | ZINC | H100 SXM5-80GB | |
Tensorflow | 2.16.1 | U-Net Medical | 1 | .89 DICE Score | 2,061 images/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 8 | EM segmentation challenge | H100 SXM5-80GB |
2.15.0 | Electra Fine Tuning | . F1 | sequences/sec | 8x H100 | DGX H100 | 24.02-py3 | Mixed | 32 | SQuaD v1.1 | H100 SXM5-80GB | ||
2.16.1 | Wide and Deep | 4 | .66 MAP at 12 | 10,746,049 samples/sec | 8x H100 | DGX H100 | 24.06-py3 | Mixed | 16384 | Kaggle Outbrain Click Prediction | H100 SXM5-80GB |
A30 Training Performance
Framework | Framework Version | Network | Time to Train (mins) |
Accuracy | Throughput | GPU | Server | Container | Precision | Batch Size | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PyTorch | 2.4.0a0 | Tacotron2 | 134 | .51 Training Loss | 223,242 total output mels/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 104 | LJSpeech 1.1 | A30 |
2.4.0a0 | WaveGlow | 400 | -5.76 Training Loss | 1,055,135 output samples/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 10 | LJSpeech 1.1 | A30 | |
2.4.0a0 | GNMT v2 | 58 | 24.3 BLEU Score | 314,472 total tokens/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | wmt16-en-de | A30 | |
2.4.0a0 | NCF | 1 | .96 Hit Rate at 10 | 41,874,445 samples/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 131072 | MovieLens 20M | A30 | |
2.4.0a0 | FastPitch | 154 | .17 Training Loss | 548,158 frames/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 16 | LJSpeech 1.1 | A30 | |
2.4.0a0 | Transformer XL Base | 198 | 22.87 Perplexity | 168,143 total tokens/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 32 | WikiText-103 | A30 | |
2.4.0a0 | EfficientNet-B0 | 782 | 77.02 Top 1 | 11,319 images/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | Imagenet2012 | A30 | |
2.4.0a0 | EfficientNet-WideSE-B0 | 805 | 77.17 Top 1 | 11,038 images/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | Imagenet2012 | A30 | |
2.4.0a0 | MoFlow | 102 | 93.96 NUV | 11,986 molecules/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 512 | ZINC | A30 | |
Tensorflow | 2.16.1 | U-Net Medical | 4 | .89 DICE Score | 475 images/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 8 | EM segmentation challenge | A30 |
2.15.0 | Electra Fine Tuning | 5 | 92.63 F1 | 1,024 sequences/sec | 8x A30 | GIGABYTE G482-Z52-00 | 24.02-py3 | Mixed | 16 | SQuaD v1.1 | A30 | |
SIM | . AUC | samples/sec | 8x A30 | GIGABYTE G482-Z52-00 | 23.12-py3 | Mixed | 16384 | Amazon Reviews | A30 |
A10 Training Performance
Framework | Framework Version | Network | Time to Train (mins) |
Accuracy | Throughput | GPU | Server | Container | Precision | Batch Size | Dataset | GPU Version |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PyTorch | 2.4.0a0 | Tacotron2 | 147 | .53 Training Loss | 212,211 total output mels/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 104 | LJSpeech 1.1 | A10 |
2.4.0a0 | WaveGlow | 506 | -5.8 Training Loss | 834,526 output samples/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 10 | LJSpeech 1.1 | A10 | |
2.4.0a0 | GNMT v2 | 69 | 24.4 BLEU Score | 262,565 total tokens/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | wmt16-en-de | A10 | |
2.4.0a0 | NCF | 1 | .96 Hit Rate at 10 | 35,888,159 samples/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 131072 | MovieLens 20M | A10 | |
2.4.0a0 | FastPitch | 181 | .17 Training Loss | 461,920 frames/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 16 | LJSpeech 1.1 | A10 | |
2.4.0a0 | Transformer XL Base | 282 | 22.8 Perplexity | 117,636 total tokens/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 32 | WikiText-103 | A10 | |
2.4.0a0 | EfficientNet-B0 | 1,041 | 77.15 Top 1 | 8,465 images/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | Imagenet2012 | A10 | |
2.4.0a0 | EfficientNet-WideSE-B0 | 1,056 | 77.3 Top 1 | 8,350 images/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 128 | Imagenet2012 | A10 | |
2.4.0a0 | MoFlow | 100 | 86.84 NUV | 12,270 images/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 512 | Medical Segmentation Decathlon | A10 | |
Tensorflow | 2.16.1 | U-Net Medical | 4 | .89 DICE Score | 382 images/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.06-py3 | Mixed | 8 | EM segmentation challenge | A10 |
2.15.0 | Electra Fine Tuning | 5 | 92.52 F1 | 826 sequences/sec | 8x A10 | GIGABYTE G482-Z52-00 | 24.02-py3 | Mixed | 16 | SQuaD v1.1 | A10 |
View More Performance Data
AI Inference
Real-world inferencing demands high throughput and low latencies with maximum efficiency across use cases. An industry-leading solution lets customers quickly deploy AI models into real-world production with the highest performance from data center to edge.
Learn MoreAI Pipeline
NVIDIA Riva is an application framework for multimodal conversational AI services that deliver real-performance on GPUs.
Learn More