NVIDIA HPC Application Performance
For Deep Learning performance, please go here.
Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The Tesla V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.
The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.
HPC Benchmarks
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.0.130 for CloverLeaf, CUDA 10.1.243 for MiniFE
Engineering
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.105 for Abaqus/Standard, CUDA 9.0.176 for ANSYS Fluent, CUDA 10.0.130 for FUN3D
Geoscience
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.243 for RTM, CUDA 10.1.105 for SPECFEM3D
Microscopy and Molecular Dynamics
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.243
Physics
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.0.130, CUDA 9.0.103 for QUDA, CUDA 10.1.243 for MILC
Quantum Mechanics
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.88, CUDA 10.0.130 for VASP
Detailed V100 application performance data is located below in alphabetical order.
Abaqus/Standard

Engineering
Simulation tool for analysis of structures
VERSION
2019
ACCELERATED FEATURES
- Direct Sparse Solver
- AMS Eigen Solver
- Steady-state Dynamics Solver
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
https://www.3ds.com/products-services/simulia/products/abaqus/abaqusstandard/
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|
Abaqus/Standard | Total Time (Sec) | LS-EPP-Combined-WC-Mkl (RR) | no | 3,309 | 2,767 | 1,855 | 1,477 | 2,941 | 1,973 | 1,635 |
Abaqus/Standard | NRF | LS-EPP-Combined-WC-Mkl (RR) | yes | 1x | 1x | 2x | 2x | 1x | 2x | 2x |
AMBER

Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
18.17-AT
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [DC-Cellulose_NVE] | ns/day | PME-Cellulose_NVE | yes | 4.73 | 100 | 199 | 398 | 797 | 106 | 212 | 424 | 847 |
AMBER [DC-Cellulose_NVE] | NRF | PME-Cellulose_NVE | yes | 1x | 21x | 42x | 84x | 168x | 22x | 45x | 90x | 179x |
AMBER [DC-FactorIX_NPT] | ns/day | Factor IX (NPT) | yes | 22.88 | 391 | 782 | 1,563 | 3,126 | 415 | 831 | 1,661 | 3,322 |
AMBER [DC-FactorIX_NPT] | NRF | Factor IX (NPT) | yes | 1x | 17x | 34x | 68x | 137x | 18x | 36x | 73x | 145x |
AMBER [DC-JAC_NVE] | ns/day | DHFR (NVE) (AKA JAC) | yes | 98.50 | 1,176 | 2,353 | 4,706 | 9,411 | 1,266 | 2,531 | 5,063 | 10,125 |
AMBER [DC-JAC_NVE] | NRF | DHFR (NVE) (AKA JAC) | yes | 1x | 12x | 24x | 48x | 96x | 13x | 26x | 51x | 103x |
AMBER [DC-STMV_NPT] | ns/day | STMV (NPT) | yes | 1.66 | 32 | 64 | 129 | 257 | 36 | 67 | 134 | 268 |
AMBER [DC-STMV_NPT] | NRF | STMV (NPT) | yes | 1x | 19x | 39x | 77x | 155x | 20x | 40x | 81x | 161x |
ANSYS Fluent

Engineering
General purpose software for the simulation of fluid dynamics
VERSION
19.2
ACCELERATED FEATURES
- Pressure-based Coupled Solver and Radiation Heat Transfer
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|
ANSYS Fluent | Total Time (Sec) | Waterjacket | no | 1,216 | 1,119 | 927 | 606 | 1,034 | 792 | 667 |
ANSYS Fluent | NRF | Waterjacket | yes | 1x | 1x | 1x | 2x | 1x | 2x | 2x |
Chroma

Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
2018
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Chroma | Total Time (Sec) | szscl21_24_128 | no | 1,083 | 1,118 | 72 | 16 | 17 | 1,098 | 70 | 18 | 12 |
Chroma | NRF | szscl21_24_128 | yes | 1x | 1x | 27x | 125x | 114x | 1x | 28x | 111x | 163x |
CloverLeaf

Benchmark
Hydrodynamics
VERSION
1.3
ACCELERATED FEATURES
- Lagrangian-Eulerian
- Explicit hydrodynamics mini-application
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|
CloverLeaf | Wall Clock (Sec) | bm32 | no | 855 | - | 93 | 90 | - | 94 |
CloverLeaf | NRF | bm32 | yes | 1x | - | 10x | 10x | - | 9x |
FUN3D

Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.4
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
FUN3D | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 486 | 98 | 50 | 27 | 18 | 94 | 49 | 25 | 19 |
FUN3D | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 6x | 11x | 21x | 31x | 6x | 12x | 22x | 30x |
GROMACS

Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2019.4
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 48.21 | 155 | 178 | 188 | 149 | 169 | 187 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 5x | 6x | 6x | 5x | 5x | 6x |
GROMACS [Cellulose] | ns/day | Cellulose | yes | 12.79 | 44 | 49 | 52 | 41 | 53 | 50 |
GROMACS [Cellulose] | NRF | Cellulose | yes | 1x | 5x | 5x | 5x | 4x | 5x | 5x |
GROMACS [STMV] | ns/day | STMV | yes | 2.63 | 10 | 16 | 12 | 10 | 14 | 14 |
GROMACS [STMV] | NRF | STMV | yes | 1x | 4x | 6x | 5x | 4x | 6x | 6x |
GTC

Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.
VERSION
4.3
ACCELERATED FEATURES
- Push, shift, and collision
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | moi#proc.in | yes | 33 | 222 | 412 | 776 | 1,330 | 237 | 426 | 817 | 1,561 |
GTC | NRF | moi#proc.in | yes | 1x | 7x | 13x | 24x | 41x | 7x | 13x | 25x | 48x |
HOOMD-Blue

Molecular Dynamics
Particle dynamics package written grounds up for GPUs
VERSION
2.5.2
ACCELERATED FEATURES
- CPU & GPU versions available
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 | 1x V100 32GB PCIe | 2x V100 32GB PCIe | 4x V100 32GB PCIe | 8x V100 32GB PCIe |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HOOMD-Blue | Ave. TPS | microsphere | yes | 14.0 | 225 | 306 | 391 | 203 | 234 | 323 | 496 | 690 | 215 | 294 | 387 | 203 |
HOOMD-Blue | NRF | microsphere | yes | 1x | 18x | 25x | 31x | 16x | 19x | 26x | 40x | 56x | 17x | 24x | 31x | 16x |
HPCG

Benchmark
Exercises computational and data access patterns that closely match a broad set of important HPC applications
VERSION
3
ACCELERATED FEATURES
- All
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe |
---|---|---|---|---|---|---|---|
HPCG | GFLOPS | 256x256x256 local size | yes | 31 | - | 293 | 576 |
HPCG | NRF | 256x256x256 local size | yes | 1x | - | 9x | 19x |
LAMMPS

Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_5Jun2019
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 1.13E+08 | 3.00E+08 | 5.37E+08 | 1.03E+09 | 1.75E+09 | 3.10E+08 | 5.63E+08 | 1.18E+09 | 2.16E+09 |
LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 3x | 6x | 11x | 18x | 3x | 6x | 12x | 23x |
LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 5.69E+07 | 1.08E+08 | 2.24E+08 | 4.02E+08 | 6.81E+08 | 1.15E+08 | 2.43E+08 | 4.34E+08 | 7.90E+08 |
LAMMPS [EAM] | NRF | EAM | yes | 1x | 2x | 5x | 8x | 14x | 2x | 5x | 9x | 17x |
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 4.61E+05 | 1.61E+06 | 2.80E+06 | 4.59E+06 | 7.01E+06 | 1.74E+06 | 2.88E+06 | 4.62E+06 | 7.18E+06 |
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 4x | 8x | 14x | 21x | 5x | 9x | 14x | 21x |
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 4.67E+07 | 1.94E+08 | 3.76E+08 | 6.70E+08 | 1.00E+09 | 2.26E+08 | 4.27E+08 | 7.85E+08 | 1.29E+09 |
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 4x | 8x | 14x | 20x | 5x | 9x | 16x | 26x |
Linpack

Benchmark
Measures floating point computing power
VERSION
2.1
ACCELERATED FEATURES
- All
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe |
---|---|---|---|---|---|---|---|
Linpack | GFLOPS | HPL.dat NB=[256] for GPU server NB=[192] for CPU server | yes | 2,176 | - | 10,090 | 19,880 |
Linpack | NRF | HPL.dat NB=[256] for GPU server NB=[192] for CPU server | yes | 1x | - | 5x | 9x |
MILC

Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
2019
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (Sec) | Apex Medium | no | 70,111 | - | 3,072 | 1,589 | - | 3,168 | 1,572 | 913 |
MILC | NRF | Apex Medium | yes | 1x | - | 25x | 49x | - | 24x | 49x | 85x |
MiniFE

Benchmark
Finite Element Analysis
VERSION
0.3
ACCELERATED FEATURES
- All
SCALABILITY
Multi-GPU
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MiniFE | Total CG Time (Sec) | 350x350x350 | no | 20.21 | 5.70 | 2.98 | 1.47 | 0.81 | 5.75 | 2.91 | 1.45 | 0.82 |
MiniFE | NRF | 350x350x350 | yes | 1x | 3x | 7x | 14x | 25x | 3x | 7x | 14x | 25x |
NAMD

Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
2.13
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | Ave ns/day | apoa1_npt_cuda | yes | 7.10 | 61.62 | 73.35 | 86.77 | 60.67 | 72.42 | 82.67 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 9x | 10x | 12x | 9x | 10x | 12x |
NAMD [apoa1_nptsr_cuda] | Ave ns/day | apoa1_nptsr_cuda | yes | 7.10 | 63.44 | 81.72 | 96.82 | 63.82 | 79.96 | 93.70 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 9x | 12x | 14x | 9x | 11x | 13x |
NAMD [apoa1_nve_cuda] | Ave ns/day | apoa1_nve_cuda | yes | 7.40 | 68.88 | 84.33 | 103.0 | 67.89 | 83.51 | 97.31 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 9x | 11x | 14x | 9x | 11x | 13x |
NAMD [stmv_npt_cuda] | Ave ns/day | stmv_npt_cuda | yes | 0.65 | 6.00 | 7.62 | 8.39 | 5.10 | 6.62 | 7.26 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 9x | 12x | 13x | 8x | 10x | 11x |
NAMD [stmv_nptsr_cuda] | Ave ns/day | stmv_nptsr_cuda | yes | 0.65 | 6.18 | 8.13 | 9.54 | 5.29 | 6.83 | 7.75 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 10x | 13x | 15x | 8x | 11x | 12x |
NAMD [stmv_nve_cuda] | Ave ns/day | stmv_nve_cuda | yes | 0.66 | 6.79 | 8.72 | 9.77 | 5.70 | 7.49 | 8.52 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 19x | 24x | 27x | 16x | 21x | 24x |
NV-WRFg

Numerical Weather Prediction
Numerical weather prediction system designed for both atmospheric research and operational forecasting applications
VERSION
NV-WRFg 3.8.1
ACCELERATED FEATURES
- Dynamics modules
- Several Physics modules
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 8x V100 16GB PCIe |
---|---|---|---|---|---|---|
NV-WRFg | Seconds / Timestamps | Conus_2.5k_JA | no | 5.51 | - | 0.74 |
NV-WRFg | NRF | Conus_2.5k_JA | yes | 1x | - | 8x |
Quantum Espresso

Material Science (Quantum Chemistry)
An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale
VERSION
6.1
ACCELERATED FEATURES
- Linear algebra (matix multiply)
- Explicit computational kernels
- 3D FFTs
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Quantum Espressso | Total CPU Time (Sec) | AUSURF112-jR | no | 724.0 | - | 200 | 99 | 90 | - | 190 | 94 | 79 |
Quantum Espressso | NRF | AUSURF112-jR | yes | 1x | - | 4x | 8x | 9x | - | 4x | 9x | 10x |
QUDA

Physics
A library for Lattice Quantum Chromo Dynamics on GPUs
VERSION
2017
ACCELERATED FEATURES
- All
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
QUDA | Dslash GFLOPS | QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 | yes | 106 | 1,429 | 2,672 | 4,761 | 5,238 | 1,422 | 2,664 | 5,024 | 6,292 |
QUDA | NRF | QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 | yes | 1x | 13x | 25x | 45x | 49x | 13x | 25x | 47x | 59x |
RELION

Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|
RELION | 1/Minutes | Plasmodium Ribosime on Relion-3.0 | yes | 2.47E-03 | 9.24E-03 | 1.47E-02 | 1.63E-02 | 9.32E-03 | 1.47E-02 | 1.59E-02 |
RELION | NRF | Plasmodium Ribosime on Relion-3.0 | yes | 1x | 4x | 6x | 7x | 4x | 6x | 6x |
RTM

Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
2018
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
RTM [Isotropic Radius 4] | Mcells/s | Isotropic Radius 4 | yes | 11,318 | 41,778 | 82,904 | 166,598 | 329,594 | 41,632 | 82,990 | 165,935 | 331,904 |
RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 4x | 7x | 15x | 29x | 4x | 7x | 15x | 29x |
RTM [TTI Radius 8 1-pass] | Mcells/s | TTI Radius 8 1-pass | yes | 3,773 | 7,699 | 15,308 | 30,442 | 60,752 | 8,345 | 16,496 | 32,909 | 65,521 |
RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 2x | 4x | 8x | 16x | 2x | 4x | 9x | 17x |
RTM [TTI RX 2Pass mgpu] | Mcells/s | TTI RX 2Pass mgpu | yes | 3,773 | 7,781 | 15,475 | 30,858 | 61,680 | 7,742 | 15,378 | 30,584 | 60,984 |
RTM [TTI RX 2Pass mgpu] | NRF | TTI RX 2Pass mgpu | yes | 1x | 2x | 4x | 8x | 16x | 2x | 4x | 8x | 16x |
SPECFEM3D

Geoscience
Simulates Seismic wave propagation
VERSION
github_a2d23d27
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 8x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 2,114 | 149 | 77 | 41 | 24 | 148 | 77 | 41 | 24 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 17x | 32x | 60x | 103x | 17x | 32x | 61x | 105x |
VASP

Material Science (Quantum Chemistry)
Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set
VERSION
5.4.4
ACCELERATED FEATURES
- Blocked Davidson (ALGO = NORMAL & FAST), RMM-DIIS (ALGO = VERYFAST & FAST), K-Points and optimization
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 1x V100 16GB PCIe | 2x V100 16GB PCIe | 4x V100 16GB PCIe | 1x V100 16GB SXM2 | 2x V100 16GB SXM2 | 4x V100 16GB SXM2 | 8x V100 16GB SXM2 |
---|---|---|---|---|---|---|---|---|---|---|---|
VASP [Si-Huge] | Elapsed Time (Sec) | Si-Huge | no | 3,535 | 1,869 | 1,595 | 1,125 | 1,959 | 1,702 | 1,342 | 1,331 |
VASP [Si-Huge] | NRF | Si-Huge | yes | 1x | 2x | 2x | 4x | 2x | 2x | 3x | 3x |
VASP [B.hR105] | Elapsed Time (Sec) | B.hR105 | no | 408 | 201 | 123 | 80 | 204 | 125 | 84 | 75 |
VASP [B.hR105] | NRF | B.hR105 | yes | 1x | 2x | 3x | 5x | 2x | 3x | 5x | 5x |
HPC Benchmarks
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243;
Engineering
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.0.130;
Geoscience
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243
Microscopy and Molecular Dynamics
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243
Physics
CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243, CUDA 10.0.130 for GTC
Detailed T4 application performance data is located below in alphabetical order.
AMBER

Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
18.17-AT
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
AMBER [DC-Cellulose_NVE] | ns/day | PME-Cellulose_NVE | yes | 4.7 | 65 | 130 | 261 |
AMBER [DC-Cellulose_NVE] | NRF | PME-Cellulose_NVE | yes | 1x | 14x | 28x | 55x |
AMBER [DC-FactorIX_NPT] | ns/day | Factor IX (NPT) | yes | 23 | 299 | 598 | 1,197 |
AMBER [DC-FactorIX_NPT] | NRF | Factor IX (NPT) | yes | 1x | 13x | 26x | 52x |
AMBER [DC-JAC_NVE] | ns/day | DHFR (NVE) (AKA JAC) | yes | 99 | 1,043 | 2,085 | 4,171 |
AMBER [DC-JAC_NVE] | NRF | DHFR (NVE) (AKA JAC) | yes | 1x | 11x | 21x | 42x |
AMBER [DC-STMV_NPT] | ns/day | STMV (NPT) | yes | 1.7 | 22 | 44 | 89 |
AMBER [DC-STMV_NPT] | NRF | STMV (NPT) | yes | 1x | 13x | 27x | 53x |
Chroma

Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
2018
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
Chroma | Total Time (Sec) | szscl21_24_128 | no | 1,083 | 119 | 38 | 22 |
Chroma | NRF | szscl21_24_128 | yes | 1x | 17x | 52x | 90x |
CloverLeaf

Benchmark
Hydrodynamics
VERSION
1.3
ACCELERATED FEATURES
- Lagrangian-Eulerian explicit hydrodynamics mini-application
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe |
---|---|---|---|---|---|
CloverLeaf | Wall Clock (Sec) | bm32 | no | 855 | 437 |
CloverLeaf | NRF | bm32 | yes | 1x | 2x |
FUN3D

Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.
VERSION
13.4
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
FUN3D | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 486 | 29 | 139 | 72 |
FUN3D | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 20x | 4x | 8x |
GROMACS

Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2019.4
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe |
---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 48 | 188 | 168 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 3x | 5x |
GROMACS [Cellulose] | ns/day | Cellulose | yes | 13 | 52 | 43 |
GROMACS [Cellulose] | NRF | Cellulose | yes | 1x | 2x | 4x |
GROMACS [STMV] | ns/day | STMV | yes | 2.6 | 16 | 10 |
GROMACS [STMV] | NRF | STMV | yes | 1x | 7x | 4x |
GTC

Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.
VERSION
4.3
ACCELERATED FEATURES
- Push, shift, and collision
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | moi#proc.in | yes | 33 | 789 | 493 | 875 |
GTC | NRF | moi#proc.in | yes | 1x | 24x | 15x | 27x |
MILC

Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
2019
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
MILC | Total Time (Sec) | Apex Medium | no | 70,111 | 1,603 | 3,888 | 2,053 |
MILC | NRF | Apex Medium | yes | 1x | 48x | 20x | 38x |
MiniFE

Benchmark
Finite Element Analysis
VERSION
0.3
ACCELERATED FEATURES
- All
SCALABILITY
Multi-GPU
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
MiniFE | Total CG Time (Sec) | 350x350x350 | no | 20.2 | 1.5 | 3.7 | 2.0 |
MiniFE | NRF | 350x350x350 | yes | 1x | 13x | 6x | 10x |
NAMD

Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
2.13
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe |
---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | Ave ns/day | apoa1_npt_cuda | yes | 7.1 | 87 | 71 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 12x | 10x |
NAMD [apoa1_nptsr_cuda] | Ave ns/day | apoa1_nptsr_cuda | yes | 7.1 | 97 | 75 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 14x | 11x |
NAMD [apoa1_nve_cuda] | Ave ns/day | apoa1_nve_cuda | yes | 7.4 | 102 | 81 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 14x | 11x |
NAMD [stmv_npt_cuda] | Ave ns/day | stmv_npt_cuda | yes | 0.7 | 8 | 3 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 13x | 5x |
NAMD [stmv_nptsr_cuda] | Ave ns/day | stmv_nptsr_cuda | yes | 0.7 | 9 | 3 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 14x | 5x |
NAMD [stmv_nve_cuda] | Ave ns/day | stmv_nve_cuda | yes | 0.7 | 10 | 4 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 28x | 10x |
NV-WRFg

Numerical Weather Prediction
Numerical weather prediction system designed for both atmospheric research and operational forecasting applications
VERSION
NV-WRFg 3.8.1
ACCELERATED FEATURES
- Dynamics modules
- Several Physics modules
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 8x T4 PCIe |
---|---|---|---|---|---|
NV-WRFg | Seconds / Timestamps | Conus_2.5k_JA | no | 5.5 | 1.1 |
NV-WRFg | NRF | Conus_2.5k_JA | yes | 1x | 5x |
RELION

Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
RELION | 1/Minutes | Plasmodium Ribosime on Relion-3.0 | yes | 2.47E-03 | 1.23E-02 | 1.57E-02 | 1.66E-02 |
RELION | NRF | Plasmodium Ribosime on Relion-3.0 | yes | 1x | 5x | 6x | 7x |
RTM

Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
2018
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
RTM [Isotropic Radius 4] | Mcells/s | Isotropic Radius 4 | yes | 11,318 | 148,708 | 58,700 | 117,772 |
RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 13x | 5x | 10x |
RTM [TTI Radius 8 1-pass] | Mcells/s | TTI Radius 8 1-pass | yes | 3,773 | 28,787 | 11,697 | 23,297 |
RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 8x | 3x | 6x |
RTM [TTI RX 2Pass mgpu] | Mcells/s | TTI RX 2Pass mgpu | yes | 3,773 | 28,625 | 11,732 | 23,417 |
RTM [TTI RX 2Pass mgpu] | NRF | TTI RX 2Pass mgpu | yes | 1x | 8x | 3x | 6x |
SPECFEM3D

Geoscience
Simulates Seismic wave propagation
VERSION
dvel_b7ed7a33
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Cascade Lake 6240 (CPU-Only) | 2x T4 PCIe | 4x T4 PCIe | 8x T4 PCIe |
---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 2,114 | 44 | 106 | 57 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 56x | 23x | 44x |