For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Platinum 8168@2.70GHz with 4x NVIDIA A100 80GB SXM4 |FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.1.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Platinum 8168@2.70GHz with 4x NVIDIA A100 80GB SXM4 or AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 40GB SXM4 | RTM Benchmark: A100 80GB SXM4, Isotropic Radius 4, CUDA Version: 11.1.0 | SPECFEM3D Benchmark: A100 40GB SXM4, four_material_simple_model, CUDA Version: 11.1.0

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Platinum 8168@2.70GHz with 4x NVIDIA A100 80GB SXM4 or AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 40GB SXM4 | AMBER Benchmark: A100 80GB SXM4, DC-STMV_NPT, CUDA Version: 11.1.0 | GROMACS Benchmark: A100 40GB SXM4, Cellulose, CUDA Version: 11.1.0 | LAMMPS Benchmark: A100 40GB SXM4, SNAP, CUDA Version: 11.1.0 | NAMD Benchmark: A100 80GB SXM4, stmv_nve_cuda, CUDA Version: 11.1.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 40GB SXM4 | GTC Benchmark: moi#proc.in, CUDA Version: 11.1.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.1.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 40GB SXM4 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.1.0


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.6-AT_20.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 80GB SXM4 2x A100 80GB SXM4 4x A100 80GB SXM4 8x A100 80GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.47 146 292 583 1,166 146 292 583 1,166
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 33x 65x 130x 261x 33x 65x 130x 261x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.49 157 314 628 1,256 157 314 628 1,256
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 35x 70x 140x 280x 35x 70x 140x 280x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.99 544 1,088 2,176 4,353 544 1,088 2,176 4,353
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 24x 47x 95x 189x 24x 47x 95x 189x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 23.52 579 1,158 2,316 4,633 579 1,158 2,316 4,633
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 25x 49x 98x 197x 25x 49x 98x 197x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 97.37 1,243 2,486 4,972 9,945 1,243 2,486 4,972 9,945
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 13x 26x 51x 102x 13x 26x 51x 102x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 99.49 1,314 2,628 5,256 10,513 1,314 2,628 5,256 10,513
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 13x 26x 53x 106x 13x 26x 53x 106x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.48 52 105 210 420 52 105 210 420
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 35x 71x 142x 284x 35x 71x 142x 284x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.1

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better 8x A100 80GB SXM4 4x A100 PCIe 40GB 8x A100 PCIe 40GB
Chroma Total Time (Sec) HMC-MG medium no 204 452 309

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6 Updated

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 80GB SXM4 2x A100 80GB SXM4 4x A100 80GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 573 52 29 18 59 31 19 18
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 13x 24x 37x 11x 21x 36x 38x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 57 300 324 542 300 319 473
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 7x 8x 13x 7x 8x 11x
GROMACS [Cellulose] ns/day Cellulose yes 16 90 135 233 89 124 160
GROMACS [Cellulose] NRF Cellulose yes 1x 8x 13x 22x 8x 12x 15x
GROMACS [STMV] ns/day STMV yes 4 19 35 54 19 34 52
GROMACS [STMV] NRF STMV yes 1x 5x 9x 14x 5x 9x 14x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
GTC Mpush/Sec moi#proc.in yes 35 292 866 1,623 3,381 315 839 1,440 2,599
GTC NRF moi#proc.in yes 1x 9x 25x 47x 98x 9x 24x 42x 76x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.1+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 85 16 11 8 7 15 12
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 5x 8x 10x 11x 5x 7x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 149 29 17 11 10 28 18
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 5x 9x 13x 15x 5x 8x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_18Sep2020

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 5.45E+08 9.96E+08 1.96E+09 3.42E+09 5.11E+08 9.52E+08 1.78E+09 3.17E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 5x 8x 16x 28x 4x 8x 15x 26x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.82E+07 2.00E+08 3.69E+08 6.96E+08 9.17E+08 1.96E+08 3.55E+08 6.52E+08 8.96E+08
LAMMPS [EAM] NRF EAM yes 1x 3x 6x 12x 16x 3x 6x 11x 16x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 3.95E+05 2.59E+06 4.20E+06 7.14E+06 1.05E+07 2.66E+06 4.43E+06 7.15E+06 1.03E+07
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 9x 14x 25x 36x 9x 15x 25x 35x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.17E+05 9.03E+05 1.74E+06 3.33E+06 6.02E+06 9.14E+05 1.76E+06 3.33E+06 6.07E+06
LAMMPS [SNAP] NRF SNAP yes 1x 9x 17x 32x 59x 9x 17x 32x 59x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 4.62E+07 4.61E+08 8.00E+08 1.46E+09 - 4.36E+08 7.76E+08 1.32E+09 -
LAMMPS [Tersoff] NRF Tersoff yes 1x 10x 17x 32x - 9x 17x 29x -

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_e0302ad4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
MILC Total Time (Sec) Apex Medium no 71,572 2,784 1,630 856 511 2,964 1,651 899 702
MILC NRF Apex Medium yes 1x 28x 48x 92x 154x 27x 48x 87x 112x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a7

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 80GB SXM4 2x A100 80GB SXM4 4x A100 80GB SXM4 8x A100 80GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.92 113 222 445 889 116 230 460 918
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 16x 32x 64x 128x 17x 33x 67x 133x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.94 119 228 456 920 119 237 473 948
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 17x 33x 66x 132x 17x 34x 68x 137x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.30 148 290 580 1,163 146 293 588 1,177
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 20x 40x 79x 159x 20x 40x 81x 161x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.66 12 23 46 95 11 23 46 93
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 18x 35x 70x 144x 17x 35x 70x 140x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.62 12 22 48 97 12 23 47 94
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 19x 35x 77x 156x 19x 38x 76x 152x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.62 14 26 52 110 13 26 53 105
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 22x 41x 84x 177x 21x 42x 85x 169x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.6a2 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 164 77 56 263 147 115
Quantum Espressso NRF AUSURF112-jR yes 1x 5x 10x 14x 3x 5x 7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

V3.1.0

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 1.33E+04 3.81E+03 2.19E+03 1.74E+03 - 3.90E+03 2.18E+03 1.80E+03 1.60E+03
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 3x 6x 8x - 3x 6x 7x 8x
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 2.42E+04 1.28E+04 8.65E+03 6.23E+03 2.60E+04 1.33E+04 9.46E+03 6.62E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 8x 13x 19x 26x 8x 12x 17x 25x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 80GB SXM4 2x A100 80GB SXM4 4x A100 80GB SXM4 8x A100 80GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 89,626 178,663 356,983 714,246 75,166 149,834 299,671 598,085
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 8x 16x 32x 63x 7x 13x 26x 53x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 13,282 26,529 52,138 104,225 12,731 25,094 49,916 98,780
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 4x 7x 14x 28x 3x 7x 13x 26x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 13,832 27,509 54,351 107,899 11,996 23,703 46,852 93,350
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 4x 7x 14x 29x 3x 6x 12x 25x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 40GB SXM4 2x A100 40GB SXM4 4x A100 40GB SXM4 8x A100 40GB SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,991 91 47 25 18 91 47 26 18
SPECFEM3D NRF four_material_simple_model yes 1x 25x 48x 90x 125x 25x 48x 89x 128x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 784 448 301
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 11x 17x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 653 383 225
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 6x 10x 16x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 949 599 414
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x - 7x 10x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.1.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.1.0 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.1.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.1.0 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.1.0 | GROMACS Benchmark: Cellulose, CUDA Version: 11.1.0 | LAMMPS Benchmark: SNAP, CUDA Version: 11.1.0 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.1.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.1.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.1.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.1.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.1.0


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.6-AT_20.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.47 97 194 388 777 101 202 405 810
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 22x 43x 87x 174x 23x 45x 91x 181x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.49 104 208 417 833 109 218 435 870
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 23x 46x 93x 186x 24x 48x 97x 194x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.99 424 848 1,695 3,390 441 882 1,764 3,529
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 18x 37x 74x 147x 19x 38x 77x 153x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 23.52 453 906 1,812 3,625 473 946 1,892 3,784
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 19x 39x 77x 154x 20x 40x 80x 161x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 97.37 1,139 2,278 4,556 9,113 1,190 2,380 4,761 9,521
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 23x 47x 94x 12x 24x 49x 98x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 99.49 1,216 2,432 4,864 9,728 1,268 2,536 5,072 10,144
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 12x 24x 49x 98x 13x 25x 51x 102x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.48 34 68 135 270 35 70 140 281
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 23x 46x 91x 183x 24x 47x 95x 190x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
Chroma Total Time (Sec) szscl21_24_128 no 1,081 154 30 16 10 142 27 15
Chroma NRF szscl21_24_128 yes 1x 7x 36x 68x 105x 8x 41x 74x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6 Updated

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 573 95 49 26 19 84 43 23 18
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 7x 14x 26x 36x 8x 15x 29x 37x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 57 198 256 448 208 245 306 206 266 328
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 5x 6x 11x 5x 6x 7x 5x 6x 8x
GROMACS [Cellulose] ns/day Cellulose yes 16 58 92 144 54 79 87 61 88 99
GROMACS [Cellulose] NRF Cellulose yes 1x 4x 9x 14x 4x 6x 8x 4x 8x 9x
GROMACS [STMV] ns/day STMV yes 4 12 25 41 12 25 33 12 25 37
GROMACS [STMV] NRF STMV yes 1x 3x 6x 11x 3x 6x 9x 3x 6x 10x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
GTC Mpush/Sec moi#proc.in yes 35 210 536 1,075 2,079 219 562 1,118 1,829
GTC NRF moi#proc.in yes 1x 6x 16x 31x 61x 6x 16x 33x 53x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.1+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 85 24 15 10 9 21 14 11
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 3x 6x 9x 10x 4x 6x 8x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 149 46 26 16 13 39 23 16
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 3x 6x 10x 12x 4x 6x 10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_18Sep2020

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 2.92E+08 5.36E+08 1.16E+09 2.12E+09 2.93E+08 5.19E+08 1.10E+09 1.97E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 2x 4x 9x 17x 2x 4x 9x 16x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.82E+07 1.08E+08 2.24E+08 4.28E+08 6.60E+08 1.10E+08 2.24E+08 4.15E+08 6.40E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 4x 7x 11x 2x 4x 7x 11x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 3.95E+05 1.57E+06 2.81E+06 4.64E+06 7.34E+06 1.65E+06 2.95E+06 4.83E+06 7.55E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 5x 10x 16x 25x 5x 10x 17x 26x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.17E+05 7.15E+05 1.39E+06 2.66E+06 5.03E+06 7.16E+05 1.40E+06 2.71E+06 5.14E+06
LAMMPS [SNAP] NRF SNAP yes 1x 7x 14x 26x 49x 7x 14x 26x 50x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 4.62E+07 2.22E+08 3.98E+08 7.92E+08 9.87E+08 2.36E+08 4.06E+08 7.77E+08 9.47E+08
LAMMPS [Tersoff] NRF Tersoff yes 1x 5x 9x 17x 21x 5x 9x 17x 21x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_e0302ad4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
MILC Total Time (Sec) Apex Medium no 71,572 5,470 2,696 1,451 851 4,264 2,374 1,334
MILC NRF Apex Medium yes 1x 14x 29x 54x 92x 18x 33x 59x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a7

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 8x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.92 93 183 367 734 61 121 241 483 97 192 385 772
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 13x 26x 53x 106x 9x 17x 35x 70x 14x 28x 56x 112x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.94 98 191 384 768 65 128 256 513 102 200 398 798
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 14x 28x 55x 111x 9x 18x 37x 74x 15x 29x 57x 115x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.30 124 245 497 984 86 173 348 689 127 253 506 1011
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 17x 34x 68x 135x 12x 24x 48x 94x 17x 35x 69x 139x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.66 8 15 31 63 5 11 21 43 8 16 32 65
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 12x 23x 47x 95x 8x 16x 32x 65x 12x 24x 49x 98x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.62 8 16 32 65 6 11 22 45 9 16 33 66
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 13x 26x 52x 104x 9x 18x 36x 72x 14x 26x 53x 107x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.62 9 18 36 71 7 13 26 53 9 18 36 72
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 14x 29x 58x 115x 11x 21x 43x 85x 15x 29x 59x 117x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x V100 32GB SXM2 4x V100S PCIe 32GB
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 6 0.62 0.68
NV-WRFg NRF Conus_2.5k_JA yes 1x 10x 9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.6a2 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 371 199 111 75 362 197 116 94
Quantum Espressso NRF AUSURF112-jR yes 1x 2x 4x 7x 11x 2x 4x 7x 9x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

V3.1.0

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 1.33E+04 5.22E+03 3.09E+03 2.41E+03 - 5.17E+03 2.99E+03 2.33E+03 -
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 3x 4x 6x - 3x 4x 6x -
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 3.49E+04 1.84E+04 1.24E+04 8.66E+03 3.62E+04 1.86E+04 1.24E+04 8.62E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 6x 9x 13x 19x 6x 9x 13x 19x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 38,119 76,011 152,084 304,147 46,116 92,016 183,566 367,708
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 7x 13x 27x 4x 8x 16x 32x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 8,492 16,795 33,132 65,943 9,108 18,121 35,982 71,912
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 9x 17x 2x 5x 10x 19x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,184 14,279 28,391 56,685 8,521 16,911 33,657 67,131
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 8x 15x 2x 4x 9x 18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,991 159 82 43 25 133 69 37 22
SPECFEM3D NRF four_material_simple_model yes 1x 14x 28x 52x 91x 17x 33x 61x 104x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 1,175 653 368
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 7x 14x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 1,100 600 307
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 3x 6x 12x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 1,376 784 518
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x 3x 5x 8x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.1.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.1.0 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.1.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.1.0 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.1.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.1.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.1.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.1.0


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.6-AT_20.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.47 66 133 265
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 15x 30x 59x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.49 68 136 273
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 15x 30x 61x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.99 322 644 1,287
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 14x 28x 56x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 23.52 330 660 1,320
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 14x 28x 56x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 97.37 1,158 2,317 4,634
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 24x 48x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 99.49 1,179 2,359 4,718
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 12x 24x 47x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.48 23 47 93
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 16x 31x 63x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,081 121 40 23
Chroma NRF szscl21_24_128 yes 1x 9x 27x 47x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6 Updated

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Fun3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 573 279 141 73
Fun3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 2x 5x 9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 57 127 239
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 6x
GROMACS [Cellulose] ns/day Cellulose yes 16 42 64
GROMACS [Cellulose] NRF Cellulose yes 1x 2x 5x
GROMACS [STMV] ns/day STMV yes 4 10 17
GROMACS [STMV] NRF STMV yes 1x 2x 4x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 35 264 518 954
GTC NRF moi#proc.in yes 1x 8x 15x 28x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.1+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 85 37 21 -
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 2x 4x -
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 149 80 43 32
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 2x 3x 5x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_e0302ad4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
MILC Total Time (Sec) Apex Medium no 71,572 7,090 3,778
MILC NRF Apex Medium yes 1x 11x 21x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a7

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.92 56 112 226
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 8x 16x 33x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.94 58 117 236
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 8x 17x 34x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.30 74 149 301
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 10x 20x 41x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.66 4 9 18
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 7x 13x 27x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.62 5 9 18
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 7x 15x 30x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.62 5 11 21
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 9x 17x 34x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x T4 PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 0.90
NV-WRFg NRF Conus_2.5k_JA yes 1x 7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

V3.1.0

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 1.33E+04 6.45E+03 3.89E+03 2.75E+03
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 2x 3x 5x
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 3.60E+04 2.05E+04 1.25E+04
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 6x 8x 13x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 29,440 58,978 117,878
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 5x 10x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 6,527 12,617 25,152
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 3x 7x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 5,906 11,719 23,401
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 3x 6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,991 207 105 57
SPECFEM3D NRF four_material_simple_model yes 1x 11x 22x 40x