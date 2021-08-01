NVIDIA HPC Application Performance

Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Detailed H100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 318 633 1,264
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 77x 153x 306x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 313 649 1,260
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 76x 157x 306x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 1,326 2,680 5,330
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 64x 129x 257x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 1,356 2,721 5,416
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 65x 130x 259x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 4,540 9,197 17,967
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 54x 109x 212x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 4,590 9,301 20,152
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 54x 109x 237x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 85 170 340
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 62x 123x 246x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 194 388 776
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 20x 39x 78x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 30 17 10
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 21x 37x 59x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 626 723 896
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 12x 14x 18x
GROMACS [Cellulose] ns/day Cellulose yes 19 189 246 350
GROMACS [Cellulose] NRF Cellulose yes 1x 14x 19x 27x
GROMACS [STMV] ns/day STMV yes 4 42 68 115
GROMACS [STMV] NRF STMV yes 1x 10x 17x 28x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
GTC Mpush/Sec moi#proc.in yes 35 758 1,370 2,441
GTC NRF moi#proc.in yes 1x 22x 40x 71x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 2,431 204 149 113
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 12x 16x 22x
ICON [QUBICC 160 km resolution] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 2,213 188 133 101
ICON [QUBICC 160 km resolution] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 12x 17x 22x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.11E+08 1.07E+09 1.93E+09 3.35E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 10x 18x 31x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.33E+07 5.13E+08 9.12E+08 1.60E+09
LAMMPS [EAM] NRF EAM yes 1x 10x 18x 31x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 1.02E+07 1.84E+07 3.04E+07
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 31x 57x 94x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 3.87E+06 7.69E+06 1.52E+07
LAMMPS [SNAP] NRF SNAP yes 1x 37x 74x 147x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 9.16E+08 1.64E+09 2.95E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 34x 60x 108x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
MILC Total Time (Sec) Apex Medium no 71,595 1,172 634 355
MILC NRF Apex Medium yes 1x 67x 124x 222x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 284 549 1,048
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 15x 29x 55x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 291 570 1,124
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 15x 29x 57x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 363 698 1,386
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 17x 34x 67x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 23 45 89
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 12x 24x 48x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 23 46 92
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 13x 26x 51x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 27 54 108
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 14x 28x 56x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 124,975 249,251 498,066
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 11x 22x 44x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 22,109 44,135 88,094
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 6x 12x 23x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 21,704 43,088 85,784
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 6x 11x 23x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x H100 SXM 2x H100 SXM 4x H100 SXM
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 46 24 14
SPECFEM3D NRF four_material_simple_model yes 1x 32x 59x 105x

Detailed L40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L40 2x L40 4x L40 8x L40
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 779 1,575 3,147 6,303
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 38x 76x 152x 304x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 797 1,613 3,182 6,428
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 38x 77x 152x 307x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 3,270 6,561 12,901 26,316
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 39x 78x 152x 311x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 3,297 6,656 13,250 26,477
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 39x 78x 156x 311x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 62 124 248 497
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 45x 90x 180x 360x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L40 2x L40 4x L40 8x L40
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 119 61 32 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 5x 10x 19x 32x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L40 2x L40 4x L40
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 566 - -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 11x - -
GROMACS [Cellulose] ns/day Cellulose yes 19 161 - 212
GROMACS [Cellulose] NRF Cellulose yes 1x 12x - 16x
GROMACS [STMV] ns/day STMV yes 4 33 55 81
GROMACS [STMV] NRF STMV yes 1x 8x 13x 20x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L40 2x L40 4x L40 8x L40
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 1.51E+06 2.89E+06 5.35E+06 8.11E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 4x 9x 16x 25x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 5.82E+05 1.16E+06 2.32E+06 4.58E+06
LAMMPS [SNAP] NRF SNAP yes 1x 6x 11x 22x 44x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 1.23E+08 2.40E+08 4.63E+08 7.01E+08
LAMMPS [Tersoff] NRF Tersoff yes 1x 4x 9x 17x 26x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L40 2x L40 4x L40 8x L40
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 188 386 761 1,542
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 10x 20x 40x 81x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 191 388 767 1,571
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 10x 20x 39x 80x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 240 481 970 1,917
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 12x 23x 47x 92x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 15 30 59 120
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 8x 16x 32x 64x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 15 31 62 123
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 8x 17x 34x 68x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 18 35 70 142
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 9x 18x 36x 73x

Detailed L4 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L4 2x L4 4x L4 8x L4
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 1,146 2,323 4,731 9,554
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 14x 27x 56x 113x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 1,162 2,366 4,811 9,666
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 14x 28x 56x 113x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L4 2x L4 4x L4 8x L4
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 209 346 464 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 4x 7x 9x -
GROMACS [Cellulose] ns/day Cellulose yes 19 57 94 133 162
GROMACS [Cellulose] NRF Cellulose yes 1x 3x 6x 10x 12x
GROMACS [STMV] ns/day STMV yes 4 12 22 43 63
GROMACS [STMV] NRF STMV yes 1x 3x 5x 10x 15x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x L4 2x L4 4x L4 8x L4
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 63 128 260 520
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 3x 7x 14x 27x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 - 131 266 535
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x - 7x 14x 27x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 86 172 347 701
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 4x 8x 17x 34x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 5 9 18 37
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 2x 5x 10x 20x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 5 - 19 39
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 3x - 11x 22x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 6 12 24 47
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 3x 6x 12x 24x

Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 182 364 726 1,456 172 334 674 1,375
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 44x 88x 176x 353x 42x 81x 163x 333x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 185 371 739 1,483 176 340 686 1,366
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 45x 90x 179x 360x 43x 83x 167x 331x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 796 1,594 3,175 6,383 769 1,525 3,054 6,139
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 38x 77x 153x 308x 37x 74x 147x 296x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 813 1,631 3,257 6,532 781 1,514 3,132 6,303
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 39x 78x 155x 312x 37x 72x 149x 301x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 2,883 5,761 11,512 23,433 2,819 5,476 11,121 23,249
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 34x 68x 136x 277x 33x 65x 131x 275x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 2,953 5,894 11,693 23,935 2,900 5,787 11,396 23,903
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 35x 69x 137x 281x 34x 68x 134x 281x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 54 107 214 429 53 107 214 427
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 39x 78x 155x 311x 39x 77x 155x 310x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 133 266 533 1,066 134 268 536 1,073
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 13x 27x 54x 108x 14x 27x 54x 108x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
Chroma Total Time (Sec) szscl21_24_128 no 1,115 36 20 11 7 44 25 13 9
Chroma NRF szscl21_24_128 yes 1x 32x 55x 99x 163x 26x 46x 84x 129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 52 28 16 11 54 29 16 13
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 12x 22x 39x 55x 11x 21x 39x 49x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 372 506 677 - 389 - 518 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 7x 10x 13x - 8x - 10x -
GROMACS [Cellulose] ns/day Cellulose yes 19 108 174 254 290 108 122 183 -
GROMACS [Cellulose] NRF Cellulose yes 1x 8x 13x 19x 22x 8x 9x 14x -
GROMACS [STMV] ns/day STMV yes 4 24 44 80 128 24 39 65 92
GROMACS [STMV] NRF STMV yes 1x 5x 11x 20x 31x 5x 9x 16x 22x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
GTC Mpush/Sec moi#proc.in yes 35 472 898 3,622 478 909 1,755 2,706
GTC NRF moi#proc.in yes 1x 14x 26x 105x 14x 26x 51x 79x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 2,431 317 218 158 134 318 224 165
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 8x 11x 15x 18x 8x 11x 15x
ICON [QUBICC 160 km resolution] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 2,213 293 197 144 120 291 192 140
ICON [QUBICC 160 km resolution] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 8x 11x 15x 18x 8x 12x 16x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.11E+08 6.00E+08 1.12E+09 2.01E+09 3.66E+09 6.00E+08 1.07E+09 1.81E+09 -
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 6x 10x 19x 34x 6x 10x 17x -
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.33E+07 2.93E+08 5.35E+08 9.23E+08 1.58E+09 2.88E+08 5.04E+08 8.48E+08 -
LAMMPS [EAM] NRF EAM yes 1x 6x 10x 18x 31x 5x 10x 17x -
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 5.24E+06 9.68E+06 1.70E+07 2.69E+07 5.28E+06 9.53E+06 1.62E+07 1.97E+07
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 16x 30x 52x 83x 16x 29x 50x 61x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 2.21E+06 4.39E+06 8.73E+06 1.67E+07 2.11E+06 4.09E+06 8.12E+06 1.58E+07
LAMMPS [SNAP] NRF SNAP yes 1x 21x 42x 85x 162x 20x 40x 79x 153x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 5.28E+08 9.81E+08 1.75E+09 2.99E+09 5.09E+08 8.74E+08 1.40E+09 -
LAMMPS [Tersoff] NRF Tersoff yes 1x 19x 36x 64x 110x 19x 32x 51x -

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB
MILC Total Time (Sec) Apex Medium no 71,595 2,029 1,184 629 361 2,088 1,111 614
MILC NRF Apex Medium yes 1x 39x 67x 125x 218x 38x 71x 128x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 175 347 689 1,368 172 341 693 1,372
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 9x 18x 36x 71x 9x 18x 36x 72x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 178 357 714 1,389 178 354 711 1,399
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 9x 18x 36x 71x 9x 18x 36x 71x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 215 436 870 1,731 214 424 851 1,714
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 10x 21x 42x 83x 10x 20x 41x 83x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 14 27 43 65 13 27 53 104
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 7x 14x 23x 35x 7x 14x 29x 56x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 14 28 56 66 14 26 55 110
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 8x 15x 31x 36x 8x 15x 31x 61x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 16 32 50 128 16 31 61 127
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 8x 17x 26x 66x 8x 16x 31x 65x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 718 111 71 47 36 114 70 49 39
Quantum Espressso NRF AUSURF112-jR yes 1x 7x 11x 17x 22x 7x 11x 16x 20x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 12,742 2,736 1,627 1,439 2,601 1,523 1,383
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 5x 8x 9x 5x 8x 9x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 89,561 178,511 356,907 713,883 89,536 178,551 339,823 713,096
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 8x 16x 32x 63x 8x 16x 30x 63x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 12,903 25,764 51,122 102,187 12,901 25,796 51,402 102,510
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 3x 7x 14x 27x 3x 7x 14x 27x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 13,957 27,664 54,933 108,607 13,743 27,265 53,741 107,880
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 4x 7x 15x 29x 4x 7x 14x 29x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A100 SXM4 80GB 2x A100 SXM4 80GB 4x A100 SXM4 80GB 8x A100 SXM4 80GB 1x A100 PCIe 80GB 2x A100 PCIe 80GB 4x A100 PCIe 80GB 8x A100 PCIe 80GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 77 40 21 13 78 41 22 15
SPECFEM3D NRF four_material_simple_model yes 1x 19x 36x 68x 116x 19x 35x 67x 100x

Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 89 177 355 714
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 22x 43x 86x 173x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 91 181 362 727
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 22x 44x 88x 176x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 406 811 1,616 3,241
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 20x 39x 78x 156x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 418 826 1,651 3,311
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 20x 39x 79x 158x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 1,503 2,989 5,973 11,932
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 18x 35x 71x 141x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 1,531 3,045 6,077 12,277
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 18x 36x 71x 144x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 29 58 116 233
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 21x 42x 84x 169x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 99 198 395 790
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 10x 20x 40x 80x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x A30 4x A30 8x A30
Chroma Total Time (Sec) szscl21_24_128 no 1,115 35 18 11
Chroma NRF szscl21_24_128 yes 1x 33x 62x 103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 111 55 29 18
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 5x 11x 21x 34x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 201 287 378 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 6x 7x -
GROMACS [Cellulose] ns/day Cellulose yes 19 60 91 119 147
GROMACS [Cellulose] NRF Cellulose yes 1x 3x 5x 9x 11x
GROMACS [STMV] ns/day STMV yes 4 12 22 41 59
GROMACS [STMV] NRF STMV yes 1x 3x 5x 10x 14x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
GTC Mpush/Sec moi#proc.in yes 35 285 531 1,049 1,774
GTC NRF moi#proc.in yes 1x 8x 15x 31x 52x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 2,431 571 354 233 206
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 4x 7x 10x 12x
ICON [QUBICC 160 km resolution] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 2,213 502 302 193 164
ICON [QUBICC 160 km resolution] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 4x 7x 11x 13x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.11E+08 3.09E+08 5.94E+08 1.10E+09 1.46E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 3x 5x 10x 14x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.33E+07 1.37E+08 2.58E+08 4.70E+08 7.30E+08
LAMMPS [EAM] NRF EAM yes 1x 3x 5x 9x 14x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 2.88E+06 5.52E+06 9.98E+06 1.41E+07
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 9x 17x 31x 44x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 1.11E+06 2.19E+06 4.37E+06 8.54E+06
LAMMPS [SNAP] NRF SNAP yes 1x 11x 21x 42x 83x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 2.51E+08 4.37E+08 7.96E+08 1.03E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 9x 16x 29x 38x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
MILC Total Time (Sec) Apex Medium no 71,595 4,710 2,025 1,087 697
MILC NRF Apex Medium yes 1x 17x 39x 72x 113x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 91 181 362 726
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 5x 9x 19x 38x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 94 187 371 745
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 5x 10x 19x 38x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 111 221 441 882
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 5x 11x 21x 42x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 7 14 29 58
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 4x 8x 15x 31x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 7 15 30 59
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 4x 8x 16x 33x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 8 16 32 65
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 4x 8x 17x 34x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 12,742 3,417 1,861 1,423 1,297
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 4x 7x 9x 10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 44,051 87,806 175,438 350,760
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 4x 8x 16x 31x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 6,757 13,361 26,710 53,281
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 7x 14x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,026 13,899 27,642 55,140
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 7x 15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A30 2x A30 4x A30 8x A30
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 156 80 41 23
SPECFEM3D NRF four_material_simple_model yes 1x 9x 18x 35x 64x

Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 97 195 390 781
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 23x 47x 94x 189x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 98 198 396 794
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 24x 48x 96x 193x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 486 984 1,965 3,954
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 23x 48x 95x 191x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 497 1,006 2,015 4,022
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 24x 48x 96x 192x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 1,922 3,889 7,780 15,568
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 23x 46x 92x 184x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 1,948 3,946 7,906 16,037
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 23x 46x 93x 188x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 32 63 127 254
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 23x 46x 92x 184x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 116 232 463 926
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 12x 23x 47x 94x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
Chroma Total Time (Sec) szscl21_24_128 no 1,115 78 41 22 13
Chroma NRF szscl21_24_128 yes 1x 15x 28x 52x 89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 231 117 59 32
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 2x 5x 10x 19x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 340 379 505 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 7x 8x 10x -
GROMACS [Cellulose] ns/day Cellulose yes 19 77 110 160 177
GROMACS [Cellulose] NRF Cellulose yes 1x 5x 8x 12x 13x
GROMACS [STMV] ns/day STMV yes 4 20 38 61 75
GROMACS [STMV] NRF STMV yes 1x 5x 9x 15x 18x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
GTC Mpush/Sec moi#proc.in yes 35 305 563 1,112 1,854
GTC NRF moi#proc.in yes 1x 9x 16x 32x 54x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 2,431 741 420 262 223
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 3x 6x 9x 11x
ICON [QUBICC 160 km resolution] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 2,213 747 415 253 192
ICON [QUBICC 160 km resolution] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 3x 5x 9x 12x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 6.85E+05 1.32E+06 2.50E+06 4.28E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 2x 3x 7x 13x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 2.43E+05 4.87E+05 9.74E+05 1.93E+06
LAMMPS [SNAP] NRF SNAP yes 1x 2x 5x 9x 19x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 5.23E+07 1.03E+08 2.02E+08 3.51E+08
LAMMPS [Tersoff] NRF Tersoff yes 1x 2x 4x 7x 13x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
MILC Total Time (Sec) Apex Medium no 71,595 6,005 3,094 1,762 1,074
MILC NRF Apex Medium yes 1x 13x 25x 45x 73x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 103 208 416 835
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 5x 11x 22x 44x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 109 220 440 882
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 6x 11x 22x 45x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 144 292 585 1,172
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 7x 14x 28x 56x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 8 15 30 61
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 4x 8x 16x 32x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 8 16 32 64
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 4x 9x 18x 35x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 10 20 39 79
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 5x 10x 20x 41x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 12,742 3,207 1,716 1,344 1,323
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 4x 7x 9x 10x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x A40 2x A40 4x A40 8x A40
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 203 103 53 29
SPECFEM3D NRF four_material_simple_model yes 1x 6x 14x 27x 50x

Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 100 202 406 805 97 199 400 808
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 24x 49x 98x 195x 24x 48x 97x 196x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 101 205 412 818 99 202 406 815
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 25x 50x 100x 198x 24x 49x 98x 198x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 483 953 1,915 3,787 470 936 1,873 3,784
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 23x 46x 92x 183x 23x 45x 90x 183x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 496 978 1,964 3,892 475 959 1,926 3,869
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 24x 47x 94x 186x 23x 46x 92x 185x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 1,870 3,293 6,613 13,031 1,789 3,293 6,598 13,149
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 22x 39x 78x 154x 21x 39x 78x 155x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 1,907 3,389 6,795 13,371 1,822 3,387 6,779 13,533
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 22x 40x 80x 157x 21x 40x 80x 159x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 31 62 125 249 28 57 113 226
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 23x 45x 90x 180x 20x 41x 82x 164x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 120 240 480 960 122 245 489 979
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 12x 24x 49x 97x 12x 25x 49x 99x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Chroma Total Time (Sec) szscl21_24_128 no 1,115 165 31 17 10 142 28 15 13
Chroma NRF szscl21_24_128 yes 1x 7x 37x 68x 111x 8x 41x 77x 85x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 495 99 50 26 15 88 45 23 14
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 5x 12x 24x 41x 6x 14x 26x 44x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 266 311 472 251 296 - 270 288 330
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 5x 6x 9x 5x 6x - 5x 6x 7x
GROMACS [Cellulose] ns/day Cellulose yes 19 71 103 156 60 83 - 73 98 -
GROMACS [Cellulose] NRF Cellulose yes 1x 4x 6x 12x 3x 5x - 4x 6x -
GROMACS [STMV] ns/day STMV yes 4 16 30 53 13 25 32 16 29 38
GROMACS [STMV] NRF STMV yes 1x 3x 7x 13x 3x 6x 7x 3x 7x 9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
GTC Mpush/Sec moi#proc.in yes 35 271 510 1,011 1,796 298 552 1,081 1,945
GTC NRF moi#proc.in yes 1x 8x 15x 29x 52x 9x 16x 31x 57x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 2,431 591 353 223 167 819 578 248
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 4x 7x 11x 15x 3x 4x 10x
ICON [QUBICC 160 km resolution] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 2,213 514 304 192 143 697 438 215
ICON [QUBICC 160 km resolution] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 4x 7x 12x 16x 3x 5x 10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.11E+08 3.41E+08 6.34E+08 1.24E+09 2.24E+09 3.45E+08 6.23E+08 1.15E+09 1.87E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 3x 6x 11x 21x 3x 6x 11x 17x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.33E+07 1.23E+08 2.67E+08 5.39E+08 9.74E+08 1.25E+08 2.66E+08 5.15E+08 8.23E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 5x 11x 19x 2x 5x 10x 16x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.45E+05 3.23E+06 6.09E+06 1.14E+07 1.94E+07 3.44E+06 6.42E+06 1.19E+07 1.91E+07
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 10x 19x 35x 60x 11x 20x 37x 59x
LAMMPS [SNAP] ATOM-Time Steps/s SNAP yes 1.08E+05 1.42E+06 2.86E+06 5.69E+06 1.14E+07 1.40E+06 2.80E+06 5.58E+06 1.12E+07
LAMMPS [SNAP] NRF SNAP yes 1x 14x 28x 55x 111x 14x 27x 54x 108x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 2.77E+07 2.71E+08 4.95E+08 9.62E+08 1.80E+09 2.81E+08 5.18E+08 9.83E+08 1.56E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 10x 18x 35x 66x 10x 19x 36x 57x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
MILC Total Time (Sec) Apex Medium no 71,595 4,737 2,347 1,229 689 3,864 2,020 1,103 1,068
MILC NRF Apex Medium yes 1x 17x 34x 64x 114x 20x 39x 71x 74x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x RTX6000 2x RTX6000 4x RTX6000 8x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 111 223 449 890 66 133 266 532 114 227 455 905
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 6x 12x 23x 46x 3x 7x 14x 28x 6x 12x 24x 47x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 116 235 470 935 70 141 282 562 119 236 473 943
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 6x 12x 24x 48x 4x 7x 14x 29x 6x 12x 24x 48x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 142 285 571 1,148 89 179 358 717 144 286 573 1,145
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 7x 14x 28x 55x 4x 9x 17x 35x 7x 14x 28x 55x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 8 17 34 68 5 10 21 41 9 18 35 70
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 5x 9x 18x 36x 3x 6x 11x 22x 5x 9x 19x 38x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 9 18 36 71 5 11 22 44 9 18 36 72
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 5x 10x 20x 39x 3x 6x 12x 24x 5x 10x 20x 40x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 10 20 40 79 6 13 26 51 10 20 40 80
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 5x 10x 20x 41x 3x 7x 13x 26x 5x 10x 21x 41x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x V100 SXM2 32GB 4x V100S PCIe 32GB
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 6 0.62 0.68
NV-WRFg NRF Conus_2.5k_JA yes 1x 10x 9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 718 270 133 82 58 260 130 88 69
Quantum Espressso NRF AUSURF112-jR yes 1x 3x 6x 10x 14x 3x 6x 9x 12x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 12,742 3,417 2,095 3,443 2,083
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 4x 6x 4x 6x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 38,091 75,978 152,022 303,986 46,037 91,790 183,515 367,252
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 7x 13x 27x 4x 8x 16x 32x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 8,538 16,885 33,070 65,732 9,276 18,304 36,393 72,591
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 9x 17x 2x 5x 10x 19x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,165 14,203 28,177 56,235 8,491 16,871 33,547 66,849
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 7x 15x 2x 4x 9x 18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 SXM2 32GB 2x V100 SXM2 32GB 4x V100 SXM2 32GB 8x V100 SXM2 32GB 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 159 82 44 25 131 68 37 23
SPECFEM3D NRF four_material_simple_model yes 1x 9x 18x 33x 58x 11x 21x 39x 63x

Detailed T4 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.13 61 121 245
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 15x 29x 59x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.12 62 123 248
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 15x 30x 60x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 20.71 285 603 1,213
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 14x 29x 59x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 20.95 292 616 1,202
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 14x 29x 57x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 84.61 1,245 2,365 4,491
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 15x 28x 53x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 85.16 1,259 2,504 4,979
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 15x 29x 58x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.38 21 42 83
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 15x 30x 60x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 9.89 107 213 427
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 11x 22x 43x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.08

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,115 117 40 26
Chroma NRF szscl21_24_128 yes 1x 10x 28x 44x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 67 163 238
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 5x
GROMACS [STMV] ns/day STMV yes 4 - 20
GROMACS [STMV] NRF STMV yes 1x - 5x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 35 236 466 893
GTC NRF moi#proc.in yes 1x 7x 14x 26x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Total Time (Sec) Apex Medium no 71,595 7,563 3,898 2,135
MILC NRF Apex Medium yes 1x 10x 20x 37x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 19.15 57 113 229
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 3x 6x 12x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 19.59 59 117 239
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 3x 6x 12x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 20.75 75 149 303
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 4x 7x 15x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 1.87 - 9 17
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x - 5x 9x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 1.81 5 9 17
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 3x 5x 10x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 1.94 - 10 20
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x - 5x 10x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
Relion [Plasmodium Ribosome] Total Wall Clock (Sec) MB numbers Plasmodium Ribosime on Relion-3.0 no 12,742 3,586 2,549
Relion [Plasmodium Ribosome] NRF MB numbers Plasmodium Ribosime on Relion-3.0 yes 1x 4x 5x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,268 239 122 64
SPECFEM3D NRF four_material_simple_model yes 1x 5x 12x 23x