For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.0

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.0 | GROMACS Benchmark: Cellulose, CUDA Version: 11.0 | LAMMPS Benchmark: SNAP, CUDA Version: 11.0 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.0


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.01-AT_20.05

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.36 143 286 572 1,144 141 282 565 1,129
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 33x 66x 131x 262x 32x 65x 130x 259x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.37 155 310 621 1,242 152 304 608 1,215
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 36x 71x 142x 284x 35x 70x 139x 278x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.26 522 1,044 2,088 4,175 529 1,058 2,117 4,234
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 23x 47x 94x 188x 24x 48x 95x 190x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.90 558 1,115 2,231 4,462 565 1,130 2,259 4,519
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 24x 49x 97x 195x 25x 49x 99x 197x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.83 1,198 2,396 4,793 9,586 1,234 2,467 4,934 9,868
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 13x 26x 51x 102x 13x 26x 53x 105x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 96.63 1,271 2,542 5,085 10,170 1,301 2,603 5,205 10,410
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 13x 26x 53x 105x 13x 27x 54x 108x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.43 52 105 210 420 51 102 204 409
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 37x 73x 147x 294x 36x 71x 143x 286x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
Chroma Total Time (Sec) szscl21_24_128 no 1,130 60 22 12 8 58 21 12 8
Chroma NRF szscl21_24_128 yes 1x 19x 51x 91x 151x 20x 53x 94x 149x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 56 30 19 58 31 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 11x 21x 33x 11x 20x 33x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 300 325 508 - 304 329 504
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 7x 8x 12x - 7x 8x 12x
GROMACS [Cellulose] ns/day Cellulose yes 17 90 130 202 258 90 126 186
GROMACS [Cellulose] NRF Cellulose yes 1x 8x 12x 18x 23x 8x 11x 17x
GROMACS [STMV] ns/day STMV yes 4 19 35 54 99 19 34 53
GROMACS [STMV] NRF STMV yes 1x 4x 9x 14x 25x 4x 9x 13x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
GTC Mpush/Sec moi#proc.in yes 35 353 623 1,186 2,269 357 616 1,157 2,204
GTC NRF moi#proc.in yes 1x 10x 18x 35x 66x 10x 18x 34x 64x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 15 11 9 8 15 12 9
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 8x 12x 14x 16x 8x 11x 14x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 30 19 13 11 30 19 14
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 10x 16x 23x 27x 10x 16x 22x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_3Mar2020
patch_24Aug2020

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 5.40E+08 9.90E+08 1.86E+09 3.33E+09 5.10E+08 9.49E+08 1.75E+09 3.16E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 4x 8x 15x 27x 4x 8x 14x 26x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.84E+07 1.98E+08 3.63E+08 6.38E+08 8.36E+08 1.90E+08 3.46E+08 6.34E+08 8.77E+08
LAMMPS [EAM] NRF EAM yes 1x 3x 6x 11x 15x 3x 6x 11x 15x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 3.94E+05 2.56E+06 4.02E+06 6.04E+06 8.51E+06 2.62E+06 4.25E+06 6.68E+06 9.01E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 9x 14x 21x 29x 9x 15x 23x 31x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.10E+05 8.18E+05 1.56E+06 2.84E+06 4.91E+06 8.20E+05 1.52E+06 2.71E+06 4.42E+06
LAMMPS [SNAP] NRF SNAP yes 1x 8x 16x 29x 51x 8x 16x 28x 46x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 4.64E+07 4.36E+08 7.69E+08 1.30E+09 - 4.16E+08 7.53E+08 1.24E+09 -
LAMMPS [Tersoff] NRF Tersoff yes 1x 9x 17x 28x - 9x 16x 27x -

LAMMPS[SNAP] Version: patch_24Aug2020 | LAMMPS[LJ 2.5], LAMMPS[EAM], LAMMPS[ReaxFF/C], LAMMPS[Tersoff] Version: stable_3Mar2020

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
MILC Total Time (Sec) Apex Medium no 67,299 3,424 1,990 1,054 592 3,727 1,965 1,078 740
MILC NRF Apex Medium yes 1x 22x 37x 70x 125x 20x 38x 69x 100x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a5

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.89 114 222 437 886 115 230 456 913
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 17x 32x 63x 129x 17x 33x 66x 133x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.97 118 232 454 912 119 235 470 943
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 17x 33x 65x 131x 17x 34x 67x 135x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.31 147 291 583 1,164 148 296 591 1,174
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 20x 40x 80x 159x 20x 41x 81x 161x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 12 23 45 91 11 22 45 91
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 18x 35x 69x 140x 17x 34x 70x 140x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.66 12 24 45 92 11 23 46 93
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 18x 36x 68x 140x 17x 34x 70x 141x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.66 14 27 51 107 13 26 52 106
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 21x 41x 77x 162x 19x 39x 79x 161x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.5 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 145 86 63 273 160 133
Quantum Espressso NRF AUSURF112-jR yes 1x 6x 9x 13x 3x 5x 6x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 68,272 136,360 271,483 544,904 68,381 136,204 264,553 529,277
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 6x 12x 24x 48x 6x 12x 23x 47x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 13,012 25,687 51,586 102,539 12,858 25,508 49,301 98,821
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 3x 7x 14x 27x 3x 7x 13x 26x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 12,238 24,228 48,097 95,443 12,240 24,032 46,994 93,607
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 3x 6x 13x 25x 3x 6x 12x 25x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4 1x A100 PCIe 40GB 2x A100 PCIe 40GB 4x A100 PCIe 40GB 8x A100 PCIe 40GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 89 47 25 18 89 47 25 17
SPECFEM3D NRF four_material_simple_model yes 1x 24x 46x 85x 118x 24x 46x 84x 126x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 784 448 301
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 11x 17x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 653 383 225
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 6x 10x 16x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 949 599 414
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x - 7x 10x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 10.2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.0 | GROMACS Benchmark: Cellulose, CUDA Version: 11.0 | LAMMPS Benchmark: SNAP, CUDA Version: 11.0 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.0


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.01-AT_20.05

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.36 96 193 385 770 100 201 401 803
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 22x 44x 88x 177x 23x 46x 92x 184x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.37 103 206 411 823 108 215 431 861
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 24x 47x 94x 188x 25x 49x 99x 197x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.26 420 840 1,680 3,361 438 876 1,753 3,506
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 19x 38x 75x 151x 20x 39x 79x 157x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.90 448 897 1,793 3,587 468 935 1,870 3,741
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 20x 39x 78x 157x 20x 41x 82x 163x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.83 1,131 2,261 4,523 9,046 1,178 2,356 4,711 9,422
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 24x 48x 96x 13x 25x 50x 100x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 96.63 1,198 2,395 4,790 9,580 1,255 2,510 5,021 10,041
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 12x 25x 50x 99x 13x 26x 52x 104x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.43 34 67 135 270 35 70 139 279
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 24x 47x 94x 189x 24x 49x 98x 195x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
Chroma Total Time (Sec) szscl21_24_128 no 1,130 139 31 16 131 27 15
Chroma NRF szscl21_24_128 yes 1x 8x 37x 71x 9x 42x 78x

CryoSPARC

Microscopy

CryoSPARC is a state-of-the-art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines

VERSION

V2.11.0

ACCELERATED FEATURES

  • all GPU app. New cryoSPARC live product allows streaming HPC reconstruction as the microscope acquires data.

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://cryosparc.com/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB
CryoSPARC [hetero_refine_6] Total Time (Sec) hetero_refine_6 no 5,185 2,288 1,649 2,199 1,580
CryoSPARC [hetero_refine_6] NRF hetero_refine_6 yes 1x 2x 3x 2x 3x
CryoSPARC [class_2D_200] Total Time (Sec) class_2D_200 no 13,297 2,349 1,896 2,189 1,714
CryoSPARC [class_2D_200] NRF class_2D_200 yes 1x 6x 7x 6x 8x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 97 49 26 19 86 44 24 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 6x 13x 24x 32x 7x 14x 26x 33x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 197 255 446 208 247 309 204 265 328
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 5x 6x 10x 5x 6x 7x 5x 6x 8x
GROMACS [Cellulose] ns/day Cellulose yes 17 57 93 144 54 79 87 60 87 98
GROMACS [Cellulose] NRF Cellulose yes 1x 4x 8x 13x 3x 6x 6x 4x 6x 9x
GROMACS [STMV] ns/day STMV yes 4 12 25 41 12 25 32 12 25 36
GROMACS [STMV] NRF STMV yes 1x 3x 6x 10x 3x 6x 8x 3x 6x 9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
GTC Mpush/Sec moi#proc.in yes 35 231 425 812 1,501 231 430 825 1,607
GTC NRF moi#proc.in yes 1x 7x 12x 24x 44x 7x 13x 24x 47x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 24 15 10 9 21 14 11
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 5x 9x 13x 15x 6x 9x 12x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 48 27 17 14 41 24 17
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 6x 11x 18x 21x 7x 12x 18x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_3Mar2020
patch_24Aug2020

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 2.89E+08 5.34E+08 1.16E+09 2.12E+09 2.93E+08 5.17E+08 1.10E+09 1.98E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 2x 4x 9x 17x 2x 4x 9x 16x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.84E+07 1.06E+08 2.20E+08 4.18E+08 6.43E+08 1.07E+08 2.18E+08 4.04E+08 6.28E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 4x 7x 11x 2x 4x 7x 11x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 3.94E+05 1.53E+06 2.71E+06 4.48E+06 6.89E+06 1.62E+06 2.85E+06 4.58E+06 7.06E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 5x 9x 15x 24x 5x 10x 16x 24x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.10E+05 5.70E+05 1.11E+06 2.11E+06 3.88E+06 5.86E+05 1.14E+06 2.17E+06 3.95E+06
LAMMPS [SNAP] NRF SNAP yes 1x 6x 11x 22x 40x 6x 12x 23x 41x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 4.64E+07 2.07E+08 3.91E+08 7.43E+08 9.45E+08 2.19E+08 4.16E+08 7.28E+08 8.89E+08
LAMMPS [Tersoff] NRF Tersoff yes 1x 5x 8x 16x 20x 5x 9x 16x 19x

LAMMPS[SNAP] Version: patch_24Aug2020 | LAMMPS[LJ 2.5], LAMMPS[EAM], LAMMPS[ReaxFF/C], LAMMPS[Tersoff] Version: stable_3Mar2020

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
MILC Total Time (Sec) Apex Medium no 67,299 5,528 3,057 1,587 893 4,781 2,685 1,455 1,179
MILC NRF Apex Medium yes 1x 13x 24x 47x 83x 15x 28x 51x 63x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a5

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 8x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.89 94 183 365 732 61 121 242 483 99 191 386 765
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 14x 27x 53x 106x 9x 18x 35x 70x 14x 28x 56x 111x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.97 98 191 383 766 65 128 254 511 101 199 397 794
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 14x 27x 55x 110x 9x 18x 36x 73x 14x 29x 57x 114x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.31 127 252 504 1,011 88 175 347 698 131 256 518 1,029
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 17x 34x 69x 138x 12x 24x 47x 95x 18x 35x 71x 141x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 8 16 31 63 5 11 21 43 8 16 32 65
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 12x 24x 48x 97x 8x 16x 33x 66x 13x 24x 50x 100x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.66 8 16 32 65 6 11 22 45 8 16 33 66
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 12x 24x 49x 98x 8x 17x 34x 68x 13x 25x 51x 100x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.66 9 18 36 72 7 13 27 53 9 18 37 73
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 14x 27x 55x 109x 10x 20x 40x 81x 14x 27x 55x 111x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x V100 32GB SXM2 4x V100S PCIe 32GB
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 6 0.62 0.68
NV-WRFg NRF Conus_2.5k_JA yes 1x 10x 9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.5 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 355 200 120 85 344 197 129 105
Quantum Espressso NRF AUSURF112-jR yes 1x 2x 4x 7x 9x 2x 4x 6x 8x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

V3.0.8

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 1.32E+04 5.75E+03 3.46E+03 2.75E+03 - 5.78E+03 3.43E+03 2.87E+03 -
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 2x 4x 5x - 2x 4x 5x -
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 2.82E+04 1.58E+04 9.88E+03 6.86E+03 2.86E+04 1.68E+04 1.06E+04 7.00E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 7x 10x 17x 24x 7x 10x 16x 23x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 36,354 72,594 145,197 290,135 42,934 85,654 171,081 342,418
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 6x 13x 26x 4x 8x 15x 30x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 7,601 15,093 29,885 59,518 8,106 16,162 32,100 64,054
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 8x 16x 2x 4x 9x 17x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,178 14,265 28,358 56,608 8,493 16,905 33,624 67,039
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 8x 15x 2x 4x 9x 18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 159 82 43 25 133 69 37 22
SPECFEM3D NRF four_material_simple_model yes 1x 13x 26x 49x 85x 16x 31x 58x 98x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 1,175 653 368
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 7x 14x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 1,100 600 307
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 3x 6x 12x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 1,376 784 518
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x 3x 5x 8x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 10.2 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.01-AT_20.05

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.36 66 133 265
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 15x 30x 61x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.37 68 136 272
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 16x 31x 62x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 22.26 323 645 1,290
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 14x 29x 58x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.90 332 665 1,329
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 15x 29x 58x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.83 1,168 2,336 4,673
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 25x 50x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 96.63 1,195 2,389 4,778
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 12x 25x 49x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.43 23 47 93
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 16x 33x 65x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,130 116 37 25
Chroma NRF szscl21_24_128 yes 1x 10x 30x 46x

CryoSPARC

Microscopy

CryoSPARC is a state-of-the-art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines

VERSION

V2.11.0

ACCELERATED FEATURES

  • all GPU app. New cryoSPARC live product allows streaming HPC reconstruction as the microscope acquires data.

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://cryosparc.com/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x T4 PCIe
CryoSPARC [hetero_refine_6] Total Time (Sec) hetero_refine_6 no 5,185 2,121
CryoSPARC [hetero_refine_6] NRF hetero_refine_6 yes 1x 2x
CryoSPARC [class_2D_200] Total Time (Sec) class_2D_200 no 13,297 2,253
CryoSPARC [class_2D_200] NRF class_2D_200 yes 1x 6x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Fun3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 279 141 73
Fun3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 2x 4x 8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 127 237 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 6x -
GROMACS [Cellulose] ns/day Cellulose yes 17 42 64 71
GROMACS [Cellulose] NRF Cellulose yes 1x 2x 4x 5x
GROMACS [STMV] ns/day STMV yes 4 10 17 26
GROMACS [STMV] NRF STMV yes 1x 2x 4x 6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 35 238 460 909
GTC NRF moi#proc.in yes 1x 7x 13x 26x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 36 21 18
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 4x 6x 7x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 82 45 32
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 4x 7x 9x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Total Time (Sec) Apex Medium no 67,299 7,165 3,751 2,546
MILC NRF Apex Medium yes 1x 10x 20x 29x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a5

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 6.89 56 111 223
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 8x 16x 32x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 6.97 58 116 233
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 8x 17x 33x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.31 75 150 303
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 10x 20x 42x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 4 9 18
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 7x 14x 27x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.66 5 9 18
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 7x 14x 28x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.66 5 11 21
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 8x 16x 32x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x T4 PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 0.90
NV-WRFg NRF Conus_2.5k_JA yes 1x 7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

V3.0.8

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 1.32E+04 4.71E+03 3.55E+03 -
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 3x 4x -
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 2.85E+04 1.59E+04 9.12E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 7x 10x 18x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 29,423 58,883 117,690
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 5x 10x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 5,951 11,576 23,090
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 3x 6x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 5,921 11,742 23,397
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 3x 6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 206 105 56
SPECFEM3D NRF four_material_simple_model yes 1x 10x 20x 38x