For Deep Learning performance, please go here.


This page is designed to provide the latest performance benchmark data for key High Performance Computing applications. Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. NVIDIA® Tesla® accelerated computing platform powers these modern data centers with the industry-leading applications to accelerate HPC and AI workloads. The Tesla V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

A single GPU-accelerated server can replace over 100 CPU-only servers. The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we use measured benchmark with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary based on the applications running on the server.





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Abaqus Direct Solver LS-EPP-Combined-WC-Mkl (RR) Total Time (Sec) no 4376.42 1856.00 1479.00 1856.00 1479.00
Abaqus Direct Solver LS-EPP-Combined-WC-Mkl (RR) NRF yes 1.00 4.00 9.00 4.00 9.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
ANSYS Fluent Waterjacket Total Time (Sec) no 1221.77 763.89 590.54 811.54
ANSYS Fluent Waterjacket NRF yes 1.00 3.00 4.00 3.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Fun3D dpw_wbt0_crs-3.6Mn_5 Loop Time (Sec) no 612.00 50.00 26.00 18.00 49.00 25.00 18.00
Fun3D dpw_wbt0_crs-3.6Mn_6 NRF yes 1.00 14.00 26.00 38.00 14.00 27.00 38.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
RTM Isotropic Radius 4 Mcells/s yes 9432.00 84307.00 168405.00 336862 83989.00 167950.00 336021
RTM Isotropic Radius 5 NRF yes 1.00 9.00 18.00 36.00 9.00 18.00 36.00
RTM TTI Radius 8 1-pass Mcells/s yes 3144.00 15646.00 31167.00 62231 16635.00 33118.00 65779
RTM TTI Radius 8 1-pass NRF yes 1.00 5.00 10.00 20.00 5.00 11.00 21.00
RTM TTI RX 2Pass mgpu Mcells/s yes 3144.00 15023.00 29870.00 59594 15096.00 30003.00 59790
RTM TTI RX 2Pass mgpu NRF yes 1.00 5.00 10.00 19.00 5.00 10.00 19.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
SPECFEM3D four_material_simple_model Total Time (Sec) no 2807.00 77.00 41.00 25.00 77.00 41.00 26.00
SPECFEM3D four_material_simple_model NRF yes 1.00 43.00 82.00 134.00 43.00 82.00 129.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Cloverleaf bm32 Wall Clock (Sec) no 2519.68 119.65 103.83 112.70 99.50
Cloverleaf bm33 NRF yes 1.00 16.00 18.00 17.00 19.00
                     





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
HPCG 256x256x256 local size GFLOPS yes 25.85 293.00 575.96 1056.00 293.00 575.96 1056.00
HPCG 256x256x256 local size NRF yes 1.00 11.00 22.00 41.00 11.00 22.00 41.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Linpack HPL.dat
NB=[256] for GPU server
NB=[192] for CPU server
GFLOPS yes 1813 10090 19880 24155 10090 19880 24155
Linpack HPL.dat
NB=[256] for GPU server
NB=[192] for CPU server
NRF yes 1.00 5.00 11.00 13.00 5.00 11.00 13.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
MiniFE 350x350x350 Total CG Time (Sec) no 21.35 2.94 1.44 0.82 2.94 1.44 0.82
MiniFE 350x350x351 NRF yes 1.00 7.00 15.00 26.00 7.00 15.00 26.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Relion MB numbers Plasmodium
Ribosime on Relion-2.1
1/Minutes yes 0.00146 0.0147 0.0161 0.0142 0.0161
Relion MB numbers Plasmodium
Ribosime on Relion-2.2
NRF yes 1.00 11.00 12.00 10.00 12.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
AMBER DHFR (NVE) (AKA JAC) ns/day yes 103.00 2150.00 4300.00 8600.00 2400.00 4800.00 9600.00
AMBER DHFR (NVE) (AKA JAC) NRF yes 1.00 21.00 42.00 83.00 23.00 47.00 93.00
AMBER Factor IX (NPT) ns/day yes 25.00 748.00 1496.00 2992.00 806.00 1612.00 3224.00
AMBER Factor IX (NPT) NRF yes 1.00 30.00 60.00 120.00 32.00 64.00 129.00
AMBER PME-Cellulose_NVE ns/day yes 5.20 212.00 424.00 848.00 226.00 452.00 904.00
AMBER PME-Cellulose_NVE NRF yes 1.00 41.00 82.00 163.00 43.00 87.00 174.00
AMBER STMV (NPT)  ns/day yes 1.80 64.00 128.00 256.00 66.00 132.00 264.00
AMBER STMV (NPT)  NRF yes 1.00 36.00 71.00 142.00 37.00 73.00 147.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
GROMACS ADH Dodec ns/day yes 54.00 176.00 193.00 175.00 201.00
GROMACS ADH Dodec NRF yes 1.00 5.00 9.00 5.00 9.00
GROMACS Cellulose ns/day yes 15.00 50.00 54.00
GROMACS Cellulose NRF yes 1.00 5.00 5.00
GROMACS (Projected) STMV ns/day yes 3.50 15 34
GROMACS (Projected) STMV NRF yes 1.00 5.00 11.00
GROMACS STMV ns/day yes 3.50 16.00 15.00
GROMACS STMV NRF yes 1.00 5.00 5.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
HOOMD-Blue microsphere Ave. TPS yes 11.89 298.15 371.06 466.88 329.43 506.09 688.99
HOOMD-Blue microsphere NRF yes 1.00 28.00 35.00 44.00 31.00 48.00 65.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
LAMMPS EAM ATOM-Time Steps/s yes 51790411 150151168 277921792 411025408 163856384 308977664 538206208
LAMMPS EAM NRF yes 1.00 3.00 6.00 10.00 4.00 7.00 12.00
LAMMPS LJ 2.6 ATOM-Time Steps/s yes 96625195 487342080 976592896 1229783040 514015232 1122000896 2018770944
LAMMPS LJ 2.6 NRF yes 1.00 6.00 11.00 14.00 6.00 12.00 22.00
LAMMPS ReaxFF/C ATOM-Time Steps/s yes 332697 2253472 3771040 5947466 2378146 3871433 6071206
LAMMPS ReaxFF/C NRF yes 1.00 15.00 25.00 39.00 15.00 25.00 39.00
LAMMPS Tersoff ATOM-Time Steps/s yes 39247062 380018688 672522240 824524800 423657472 769769472 1185472512
LAMMPS Tersoff NRF yes 1.00 10.00 18.00 22.00 11.00 20.00 31.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
NAMD apoa1_npt_cuda Ave ns/day yes 4.10 67.00 77.00 70.00 81.00
NAMD apoa1_npt_cuda NRF yes 1.00 117.00 134.00 122.00 141.00
NAMD apoa1_nptsr_cuda Ave ns/day yes 4.10 74.00 86.00 77.00 90.00
NAMD apoa1_nptsr_cuda NRF yes 1.00 129.00 150.00 134.00 157.00
NAMD apoa1_nve_cuda Ave ns/day yes 4.40 79.00 82.00 81.00 93.00
NAMD apoa1_nve_cuda NRF yes 1.00 92.00 95.00 94.00 108.00
NAMD stmv_npt_cuda Ave ns/day yes 0.38 7.10 7.90 6.50 7.10
NAMD stmv_npt_cuda NRF yes 1.00 32.00 35.00 29.00 32.00
NAMD stmv_nptsr_cuda Ave ns/day yes 0.38 7.90 8.80 6.80 7.60
NAMD stmv_nptsr_cuda NRF yes 1.00 37.00 41.00 32.00 36.00
NAMD stmv_nve_cuda Ave ns/day yes 0.38 8.50 9.30 7.40 8.50
NAMD stmv_nve_cuda NRF yes 1.00 34.00 37.00 30.00 34.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
Chroma szscl21_24_128 Total Time (Sec) no 1140.00 75.00 19.00 18.00 81.00 19.00 14.00
Chroma szscl21_24_129 NRF yes 1.00 28.00 109.00 116.00 26.00 109.00 149.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
GTC moi#proc.in Mpush/Sec yes 35.00 439.00 846.00 1430.00 457.00 877.00 1667.00
GTC moi#proc.in NRF yes 1.00 13.00 25.00 41.00 13.00 25.00 48.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
MILC Apex Medium Total Time (Sec) no 72194.00 3376 1637 1473 3292 1623 927
MILC Apex Medium NRF yes 1.00 25.00 51.00 57.00 25.00 52.00 90.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
QUDA QPhil Dslash Wilson-Clover
Precision: Single;
Gau. Cmprsn/Recon: 12;
Probl. Size 32x32x32x64
Dslash GFLOPS yes 111.60 2672.00 4760.58 5238.3 2663.97 5024.36 6292.29
QUDA QPhil Dslash Wilson-Clover
Precision: Single;
Gau. Cmprsn/Recon: 12;
Probl. Size 32x32x32x64
NRF yes 1.00 31.00 56.00 62.00 31.00 59.00 74.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
QE AUSURF112-jR Total CPU Time (Sec) no 740.00 200.00 99.00 190.00 94.00
QE AUSURF112-jR NRF yes 1.00 6.00 13.00 7.00 14.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
VASP B.hR105 Elapsed Time (Sec) no 661.00 123.00 80.00 125.00 84.00
VASP B.hR106 NRF yes 1.00 22.00 34.00 22.00 33.00
VASP Si-Huge Elapsed Time (Sec) no 4364.00 2094.00 1591.00 2263.00 1771.00
VASP Si-Huge NRF yes 1.00 4.00 9.00 3.00 9.00





Application Test Modules Metric Bigger is
better
Dual Skylake 2x V100
16GB PCIe
4x V100
16GB PCIe
8x V100
16GB PCIe
2x V100
16GB SXM2
4x V100
16GB SXM2
8x V100
16GB SXM2
WRF Conus_2.5k_JA Seconds / Timestamps no 5.20 0.68 0.52 0.62 0.38
WRF Conus_2.5k_JA NRF yes 1.00 9.00 11.00 9.00 15.00




Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Fun3D dpw_wbt0_crs-3.6Mn_5 Loop Time (Sec) no 612.00 268.00 132.00 68.00
Fun3D dpw_wbt0_crs-3.6Mn_6 NRF yes 1.00 2.00 5.00 10.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM Isotropic Radius 4 Mcells/s yes 9432.00 21987.00 43948.00 87930.00
RTM Isotropic Radius 5 NRF yes 1.00 2.00 5.00 9.00
RTM TTI Radius 8 1-pass Mcells/s yes 3144.00 5520.00 10962.00 21749.00
RTM TTI Radius 8 1-pass NRF yes 1.00 2.00 3.00 7.00
RTM TTI RX 2Pass mgpu Mcells/s yes 3144.00 4945.00 9816.00 19578.00
RTM TTI RX 2Pass mgpu NRF yes 1.00 2.00 3.00 6.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D four_material_simple_model Total Time (Sec) no 2807.00 207.00 105.00 57.00
SPECFEM3D four_material_simple_model NRF yes 1.00 16.00 32.00 59.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Cloverleaf bm32 Wall Clock (Sec) no 2519.68 1002.00
Cloverleaf bm33 NRF yes 1.00 3.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
HPCG 256x256x256 local size GFLOPS yes 25.85 117.20 230.38 422.40
HPCG 256x256x256 local size NRF yes 1.00 4.00 9.00 16.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MiniFE 350x350x350 Total CG Time (Sec) no 21.35 7.18 3.62 1.94
MiniFE 350x350x351 NRF yes 1.00 3.00 6.00 11.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Relion MB numbers Plasmodium Ribosime on Relion-2.1 1/Minutes yes 0.00146 0.01 0.02
Relion MB numbers Plasmodium Ribosime on Relion-2.2 NRF yes 1.00 9.00 11.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER DHFR (NVE) (AKA JAC) ns/day yes 103.00 1094.00 2188.00 4376.00
AMBER DHFR (NVE) (AKA JAC) NRF yes 1.00 11.00 21.00 42.00
AMBER Factor IX (NPT) ns/day yes 25.00 302.00 604.00 1208.00
AMBER Factor IX (NPT) NRF yes 1.00 12.00 24.00 48.00
AMBER PME-Cellulose_NVE ns/day yes 5.20 66.00 132.00 264.00
AMBER PME-Cellulose_NVE NRF yes 1.00 13.00 25.00 51.00
AMBER STMV (NPT)  ns/day yes 1.80 22.00 44.00 88.00
AMBER STMV (NPT)  NRF yes 1.00 12.00 24.00 49.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GROMACS ADH Dodec ns/day yes 54.00 129.00 153.00
GROMACS ADH Dodec NRF yes 1.00 3.00 5.00
GROMACS Cellulose ns/day yes 15.00 34.00 42.00 49.00
GROMACS Cellulose NRF yes 1.00 2.00 3.00 5.00
GROMACS STMV ns/day yes 3.50 9.30
GROMACS STMV NRF yes 1.00 3.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
HOOMD-Blue microsphere Ave. TPS yes 11.89 43.00 88.90 113.00
HOOMD-Blue microsphere NRF yes 1.00 4.00 8.00 11.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
LAMMPS EAM ATOM-Time Steps/s yes 51790411 88834048.00
LAMMPS EAM NRF yes 1.00 2.00
LAMMPS LJ 2.6 ATOM-Time Steps/s yes 96625195 162414592.00 318832640.00
LAMMPS LJ 2.6 NRF yes 1.00 2.00 3.00
LAMMPS ReaxFF/C ATOM-Time Steps/s yes 332697 904471.00
LAMMPS ReaxFF/C NRF yes 1.00 4.00
LAMMPS Tersoff ATOM-Time Steps/s yes 39247062 76619776.00 147668992.00
LAMMPS Tersoff NRF yes 1.00 2.00 4.00





<
Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
NAMD apoa1_npt_cuda Ave ns/day yes 4.10 49.00 66.00
NAMD apoa1_npt_cuda NRF yes 1.00 85.00 115.00
NAMD apoa1_nptsr_cuda Ave ns/day yes 4.10 50.00 71.00
NAMD apoa1_nptsr_cuda NRF yes 1.00 87.00 123.00
NAMD apoa1_nve_cuda Ave ns/day yes 4.40 54.00 76.00
NAMD apoa1_nve_cuda NRF yes 1.00 63.00 88.00
NAMD stmv_npt_cuda Ave ns/day yes 0.38 4.40 6.90
NAMD stmv_npt_cuda NRF yes 1.00 20.00 31.00
NAMD stmv_nptsr_cuda Ave ns/day yes 0.38 4.50 7.10
NAMD stmv_nptsr_cuda NRF yes 1.00 21.00 33.00
NAMD stmv_nve_cuda Ave ns/day yes 0.38 4.70 7.70
NAMD stmv_nve_cuda NRF yes 1.00 19.00 31.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma szscl21_24_128 Total Time (Sec) no 1140.00 101.00 39.00 26.00
Chroma szscl21_24_129 NRF yes 1.00 21.00 53.00 80.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC moi#proc.in Mpush/Sec yes 35.00 264.00 523.00 928.00
GTC moi#proc.in NRF yes 1.00 8.00 15.00 27.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Apex Medium Total Time (Sec) no 72194.00 7978.94 3835.19 2474.58
MILC Apex Medium NRF yes 1.00 10.00 22.00 34.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
QUDA QPhil Dslash Wilson-Clover
Precision: Single; Gau. Cmprsn/Recon: 12;
Probl. Size 32x32x32x64
Dslash GFLOPS yes 111.60 1984.16 2319.25 3626.52
QUDA QPhil Dslash Wilson-Clover
Precision: Single; Gau. Cmprsn/Recon: 12;
Probl. Size 32x32x32x64
NRF yes 1.00 23.00 27.00 43.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
VASP B.hR105 Elapsed Time (Sec) no 661.00 515.00 288.00 182.00
VASP B.hR106 NRF yes 1.00 2.00 10.00 15.00
VASP Si-Huge Elapsed Time (Sec) no 4364.00 2635.00
VASP Si-Huge NRF yes 1.00 2.00





Application Test Modules Metric Bigger is better Dual Skylake 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
WRF Conus_2.5k_JA Seconds / Timestamps no 5.20 3.60 1.80 1.00
WRF Conus_2.5k_JA NRF yes 1.00 1.00 3.00 6.00