NVIDIA HPC Application Performance
For Deep Learning performance, please go here.
Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.
The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.
Detailed H200 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 301 | 602 | 1,221 | 2,427 |
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 30x | 59x | 120x | 239x |
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 306 | 612 | 1,246 | 2,504 |
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 29x | 59x | 119x | 240x |
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 1,307 | 2,617 | 5,416 | 10,438 |
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 22x | 44x | 91x | 175x |
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 1,332 | 2,666 | 5,377 | 10,684 |
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 24x | 48x | 97x | 193x |
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 4,335 | 8,677 | 17,255 | 32,487 |
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 18x | 37x | 74x | 138x |
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 4,533 | 8,916 | 17,733 | 33,431 |
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 19x | 38x | 76x | 144x |
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 90 | 180 | 361 | 722 |
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 30x | 61x | 121x | 242x |
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 203 | 406 | 812 | 1,624 |
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 8x | 16x | 32x | 63x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | 24 | 14 | 9 | 7 |
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 10x | 17x | 27x | 33x |
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 35 | 20 | 12 | 9 |
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 11x | 20x | 33x | 43x |
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | 100 | 54 | 32 | 21 |
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 11x | 20x | 34x | 52x |
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | - | 42 | 26 |
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 35x | 56x |
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | - | 123 | 72 |
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | 37x | 63x |
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 769 | 1,581 | 2,609 | 5,233 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 4x | 8x | 14x | 28x |
GROMACS [STMV] | ns/day | STMV | yes | 14 | 44 | 73 | 123 | 182 |
GROMACS [STMV] | NRF | STMV | yes | 1x | 3x | 6x | 12x | 18x |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 823 | 1,541 | 2,984 | 5,265 |
GTC | NRF | mpi#proc.in | yes | 1x | 10x | 19x | 36x | 63x |
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
Stable_2Aug2023
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.51E+08 | 1.39E+09 | 2.53E+09 | 4.52E+09 | 7.43E+09 |
LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 4x | 7x | 14x | 22x |
LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.85E+08 | 5.55E+08 | 1.03E+09 | 1.86E+09 | 2.95E+09 |
LAMMPS [EAM] | NRF | EAM | yes | 1x | 3x | 6x | 11x | 17x |
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.37E+06 | 1.12E+07 | 2.01E+07 | 3.34E+07 | 5.00E+07 |
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 12x | 22x | 36x | 54x |
LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 5.09E+05 | 4.00E+06 | 7.95E+06 | 1.57E+07 | 3.00E+07 |
LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 9x | 18x | 36x | 68x |
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.03E+08 | 1.02E+09 | 1.79E+09 | 3.20E+09 | 5.79E+09 |
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 12x | 20x | 37x | 66x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 969 | 526 | 300 | 188 |
MILC | NRF | Apex Medium | yes | 1x | 29x | 53x | 94x | 149x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 311 | 615 | 1,241 | 2,469 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 5x | 10x | 19x | 38x |
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 321 | 635 | 1,266 | 2,525 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 5x | 10x | 19x | 39x |
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 395 | 794 | 1,559 | 3,018 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 6x | 11x | 22x | 42x |
NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 6.58 | 27 | 54 | 107 | 214 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 4x | 8x | 16x | 32x |
NAMD [stmv_nptsr_cuda] | ns/day | stmv_nptsr_cuda | yes | 6.71 | 27 | 55 | 109 | 219 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 4x | 8x | 16x | 33x |
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 32 | 65 | 129 | 258 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 5x | 9x | 19x | 37x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 |
---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 38 | 21 | 12 | 9 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 11x | 21x | 37x | 49x |
Detailed GH200 96GB application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 297 | 1,211 |
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 29x | 119x |
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 302 | 1,259 |
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 29x | 121x |
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 1,286 | 5,333 |
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 22x | 89x |
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 1,313 | 5,502 |
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 24x | 100x |
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 4,387 | 17,023 |
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 19x | 73x |
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 4,496 | 17,315 |
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 19x | 74x |
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 94 | 374 |
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 31x | 126x |
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 202 | 807 |
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 8x | 31x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | 24 | 10 |
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 10x | 25x |
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 36 | 13 |
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 11x | 30x |
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | 105 | 38 |
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 10x | 29x |
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | 48 |
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | 30x |
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | 138 |
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | 33x |
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 845 | 2,865 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 4x | 15x |
GROMACS [STMV] | ns/day | STMV | yes | 14 | 48 | 126 |
GROMACS [STMV] | NRF | STMV | yes | 1x | 4x | 12x |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 814 | 2,590 |
GTC | NRF | mpi#proc.in | yes | 1x | 10x | 31x |
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.7_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
ICON [SLAM 191 - 160KM - no radiation] | Integrate_nh (sec) | SLAM 191 levels 160 km resolution without radiation | no | 964 | 179 | 109 |
ICON [SLAM 191 - 160KM - no radiation] | NRF | SLAM 191 levels 160 km resolution without radiation | yes | 1x | 5x | 9x |
ICON [QUBICC 160 km resolution] | Integrate_nh (sec) | QUBICC 160 km resolution | no | 798 | 172 | 95 |
ICON [QUBICC 160 km resolution] | NRF | QUBICC 160 km resolution | yes | 1x | 5x | 8x |
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
Stable_2Aug2023
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB |
---|---|---|---|---|---|
LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.51E+08 | 1.52E+09 |
LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 4x |
LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.85E+08 | 5.87E+08 |
LAMMPS [EAM] | NRF | EAM | yes | 1x | 3x |
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.37E+06 | 1.12E+07 |
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 12x |
LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 5.09E+05 | 4.01E+06 |
LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 9x |
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.03E+08 | 1.07E+09 |
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 12x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB |
---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 920 |
MILC | NRF | Apex Medium | yes | 1x | 31x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 309 | 1,169 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 5x | 18x |
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 321 | 1,225 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 5x | 19x |
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 392 | 1,424 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 6x | 20x |
NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 6.58 | 27 | 106 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 4x | 16x |
NAMD [stmv_nptsr_cuda] | ns/day | stmv_nptsr_cuda | yes | 6.71 | 27 | 108 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 4x | 16x |
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 32 | 127 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 5x | 18x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB |
---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 41 | 12 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 11x | 35x |
Detailed H100 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 249 | 505 | 1,020 | 1,997 | 289 | 581 | 1,229 | 2,408 |
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 25x | 50x | 101x | 197x | 28x | 57x | 121x | 238x |
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 253 | 513 | 1,029 | 2,113 | 292 | 593 | 1,239 | 2,747 |
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 24x | 49x | 99x | 202x | 28x | 57x | 119x | 263x |
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 1,109 | 2,245 | 4,534 | 9,216 | 1,246 | 2,508 | 5,149 | 11,308 |
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 19x | 38x | 76x | 155x | 21x | 42x | 86x | 190x |
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 1,132 | 2,293 | 4,632 | 9,443 | 1,292 | 2,588 | 5,263 | 11,507 |
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 20x | 41x | 84x | 171x | 23x | 47x | 95x | 208x |
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 3,898 | 7,720 | 15,449 | 31,657 | 4,280 | 8,504 | 17,043 | 34,274 |
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 17x | 33x | 66x | 135x | 18x | 36x | 73x | 146x |
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 3,990 | 7,956 | 15,571 | 31,360 | 4,344 | 8,831 | 17,888 | 36,412 |
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 17x | 34x | 67x | 135x | 19x | 38x | 77x | 156x |
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 81 | 161 | 322 | 645 | 85 | 170 | 341 | 681 |
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 27x | 54x | 108x | 216x | 29x | 57x | 114x | 229x |
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 179 | 357 | 715 | 1,430 | 195 | 391 | 781 | 1,563 |
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 7x | 14x | 28x | 56x | 8x | 15x | 31x | 61x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | 29 | 17 | 10 | 10 | 27 | 16 | 10 | 8 |
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 8x | 14x | 24x | 25x | 9x | 15x | 25x | 32x |
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 43 | 24 | 14 | 14 | 40 | 22 | 14 | 10 |
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 9x | 17x | 29x | 28x | 10x | 18x | 30x | 42x |
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | 127 | 67 | 38 | 31 | 116 | 62 | 36 | 23 |
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 9x | 16x | 28x | 35x | 9x | 17x | 30x | 47x |
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | - | 50 | 38 | - | - | 46 | 28 |
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 29x | 38x | - | - | 31x | 52x |
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | - | 151 | 98 | - | - | 140 | 80 |
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | 30x | 47x | - | - | 33x | 57x |
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 691 | 1,321 | 2,572 | 5,117 | 776 | 1,440 | 2,596 | 5,164 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 4x | 7x | 14x | 27x | 4x | 8x | 14x | 27x |
GROMACS [STMV] | ns/day | STMV | yes | 14 | 40 | 67 | 100 | - | 43 | 72 | 120 | 177 |
GROMACS [STMV] | NRF | STMV | yes | 1x | 3x | 6x | 10x | - | 3x | 6x | 12x | 17x |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 740 | 1,338 | 2,381 | 3,953 | 768 | 1,422 | 2,780 | 5,196 |
GTC | NRF | mpi#proc.in | yes | 1x | 9x | 16x | 29x | 48x | 9x | 17x | 33x | 62x |
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
Stable_2Aug2023
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.51E+08 | 1.07E+09 | 1.80E+09 | 3.36E+09 | 4.79E+09 | 1.29E+09 | 2.33E+09 | 4.19E+09 | 7.03E+09 |
LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 3x | 5x | 10x | 14x | 4x | 7x | 13x | 21x |
LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.85E+08 | 4.74E+08 | - | 1.58E+09 | - | 5.19E+08 | 9.63E+08 | 1.74E+09 | 2.84E+09 |
LAMMPS [EAM] | NRF | EAM | yes | 1x | 3x | - | 9x | - | 3x | 5x | 10x | 16x |
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.37E+06 | 9.66E+06 | 1.68E+07 | 2.88E+07 | 3.94E+07 | 1.05E+07 | 1.91E+07 | 3.12E+07 | 4.75E+07 |
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 11x | 18x | 31x | 43x | 11x | 21x | 34x | 52x |
LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 5.09E+05 | 3.35E+06 | 6.06E+06 | 1.30E+07 | 2.55E+07 | 3.90E+06 | 7.76E+06 | 1.53E+07 | 2.91E+07 |
LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 9x | 14x | 29x | 58x | 9x | 18x | 35x | 66x |
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.03E+08 | 8.53E+08 | 1.43E+09 | 2.80E+09 | 4.18E+09 | 9.93E+08 | 1.80E+09 | 3.30E+09 | 5.53E+09 |
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 10x | 16x | 32x | 48x | 11x | 21x | 38x | 63x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 1,286 | 806 | 385 | 1,163 | 623 | 355 | 215 |
MILC | NRF | Apex Medium | yes | 1x | 22x | 35x | 73x | 24x | 45x | 79x | 130x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 273 | 550 | 1,106 | 2,209 | 299 | 596 | 1,181 | 2,300 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 4x | 9x | 17x | 34x | 5x | 9x | 18x | 36x |
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 280 | 566 | 1,136 | 2,266 | 306 | 612 | 1,212 | 2,364 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 4x | 9x | 17x | 35x | 5x | 9x | 19x | 36x |
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 339 | 684 | 1,377 | 2,738 | 377 | 759 | 1,490 | 2,910 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 5x | 10x | 19x | 38x | 5x | 11x | 21x | 41x |
NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 6.58 | 23 | 47 | 94 | 188 | 25 | 51 | 101 | 203 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 4x | 7x | 14x | 29x | 4x | 8x | 15x | 31x |
NAMD [stmv_nptsr_cuda] | ns/day | stmv_nptsr_cuda | yes | 6.71 | 24 | 48 | 96 | 193 | 26 | 52 | 104 | 208 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 4x | 7x | 14x | 29x | 4x | 8x | 16x | 31x |
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 27 | 55 | 110 | 222 | 31 | 61 | 123 | 245 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 4x | 8x | 16x | 32x | 4x | 9x | 18x | 35x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 52 | 27 | 14 | 10 | 46 | 24 | 14 | 10 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 8x | 16x | 30x | 44x | 10x | 18x | 32x | 45x |
Detailed L40S application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 983 | 1,972 | 3,994 | 8,259 | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 16x | 33x | 67x | 139x | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 1,007 | 2,009 | 4,058 | 8,532 | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 18x | 36x | 73x | 154x | ||||||||
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 4,037 | 8,081 | 16,436 | 32,051 | ||||||||
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 17x | 34x | 70x | 137x | ||||||||
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 4,065 | 8,259 | 16,728 | 33,529 | ||||||||
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 17x | 35x | 72x | 144x | ||||||||
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 91 | 183 | 366 | 732 | ||||||||
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 31x | 61x | 123x | 246x | ||||||||
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 191 | 381 | 763 | 1,526 | ||||||||
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 7x | 15x | 30x | 60x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | - | 66 | 34 | 19 | ||||||||
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | - | 4x | 7x | 13x | ||||||||
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 166 | 85 | 45 | 25 | ||||||||
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 2x | 5x | 9x | 16x | ||||||||
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | - | 241 | 127 | 70 | ||||||||
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | - | 4x | 9x | 15x | ||||||||
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | - | 178 | 97 | ||||||||
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 8x | 15x | ||||||||
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | - | - | 295 | ||||||||
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | - | 15x |
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 649 | 1,434 | 2,614 | 5,198 | ||||||||
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 3x | 8x | 14x | 27x | ||||||||
GROMACS [STMV] | ns/day | STMV | yes | 14 | 43 | 69 | 103 | - | ||||||||
GROMACS [STMV] | NRF | STMV | yes | 1x | 3x | 6x | 10x | - |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 442 | 809 | 1,591 | 3,066 | ||||||||
GTC | NRF | mpi#proc.in | yes | 1x | 5x | 10x | 19x | 37x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 4,047 | 2,051 | 1,343 | ||||||||
MILC | NRF | Apex Medium | yes | 1x | 7x | 14x | 21x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 230 | 457 | 900 | 1,816 | ||||||||
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 4x | 7x | 14x | 28x | ||||||||
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 230 | 457 | 910 | 1,803 | ||||||||
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 4x | 7x | 14x | 28x | ||||||||
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 299 | 602 | 1,193 | 2,369 | ||||||||
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 4x | 8x | 17x | 33x | ||||||||
NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 6.58 | 17 | 34 | 67 | 135 | ||||||||
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 3x | 5x | 10x | 20x | ||||||||
NAMD [stmv_nptsr_cuda] | ns/day | stmv_nptsr_cuda | yes | 6.71 | 17 | 35 | 70 | 139 | ||||||||
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 3x | 5x | 10x | 21x | ||||||||
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 22 | 44 | 88 | 176 | ||||||||
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 3x | 6x | 13x | 25x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 171 | 86 | 44 | 23 | ||||||||
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 2x | 4x | 10x | 19x |
Detailed L4 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L4 | 2x L4 | 4x L4 | 8x L4 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 52 | 106 | 212 | 426 | ||||||||
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 5x | 10x | 21x | 42x | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 54 | 108 | 215 | 433 | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 5x | 10x | 21x | 41x | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 259 | 519 | 1,039 | 2,142 | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 4x | 9x | 17x | 36x | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 265 | 533 | 1,066 | 2,132 | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 5x | 10x | 19x | 39x | ||||||||
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 1,225 | 2,450 | 4,931 | 9,899 | ||||||||
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 5x | 10x | 21x | 42x | ||||||||
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 1,241 | 2,481 | 5,018 | 10,161 | ||||||||
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 5x | 11x | 22x | 44x | ||||||||
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 20 | 40 | 81 | 162 | ||||||||
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 7x | 14x | 27x | 54x | ||||||||
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 114 | 227 | 455 | 910 | ||||||||
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 4x | 9x | 18x | 36x |
AMBER is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x L4 | 2x L4 | 4x L4 | 8x L4 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 184 | 334 | 664 | 1,234 | ||||||||
GTC | NRF | mpi#proc.in | yes | 1x | 2x | 4x | 8x | 15x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 2x L4 | 4x L4 | 8x L4 |
---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 5,875 | 3,002 | 1,587 |
MILC | NRF | Apex Medium | yes | 1x | 5x | 9x | 18x |
Detailed A100 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 182 | 368 | 741 | 1,470 | 174 | 336 | 696 | 1,414 |
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 18x | 36x | 73x | 145x | 17x | 33x | 69x | 139x |
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 186 | 370 | 750 | 1,485 | 177 | 343 | 712 | 1,460 |
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 18x | 35x | 72x | 142x | 17x | 33x | 68x | 140x |
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 798 | 1,606 | 3,221 | 6,379 | 771 | 1,487 | 3,062 | 6,349 |
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 13x | 27x | 54x | 107x | 13x | 25x | 51x | 107x |
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 817 | 1,630 | 3,246 | 6,559 | 790 | 1,577 | 3,181 | 6,655 |
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 15x | 29x | 59x | 119x | 14x | 29x | 58x | 120x |
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 2,888 | 5,738 | 11,503 | 23,868 | 2,875 | 5,582 | 11,506 | 23,739 |
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 12x | 24x | 49x | 102x | 12x | 24x | 49x | 101x |
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 2,966 | 5,877 | 11,721 | 23,515 | 2,866 | 5,912 | 11,891 | 24,101 |
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 13x | 25x | 50x | 101x | 12x | 25x | 51x | 104x |
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 53 | 107 | 214 | 427 | 52 | 105 | 210 | 420 |
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 18x | 36x | 72x | 143x | 18x | 35x | 70x | 141x |
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 134 | 269 | 537 | 1,074 | 135 | 270 | 539 | 1,078 |
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 5x | 10x | 21x | 42x | 5x | 11x | 21x | 42x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | 48 | 26 | 16 | 11 | 48 | 26 | 15 | 12 |
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 5x | 9x | 15x | 21x | 5x | 9x | 16x | 21x |
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 72 | 40 | 23 | 16 | 72 | 39 | 22 | 16 |
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 6x | 10x | 17x | 26x | 6x | 10x | 18x | 26x |
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | 195 | 103 | 57 | 35 | 199 | 103 | 57 | 35 |
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 6x | 11x | 19x | 31x | 5x | 10x | 19x | 31x |
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | - | 81 | 46 | - | - | 81 | 47 |
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 18x | 31x | - | - | 18x | 31x |
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | - | 228 | 124 | - | - | - | 128 |
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | 20x | 37x | - | - | - | 36x |
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 453 | 933 | 1,672 | - | 480 | 883 | 1,351 | 3,263 |
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 5x | 9x | - | 3x | 5x | 7x | 17x |
GROMACS [STMV] | ns/day | STMV | yes | 14 | 24 | 45 | 81 | 130 | 23 | 41 | 66 | - |
GROMACS [STMV] | NRF | STMV | yes | 1x | 2x | 3x | 7x | 12x | 2x | 3x | 6x | - |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 489 | 906 | 1,787 | 3,327 | 490 | 888 | 1,726 | 2,796 |
GTC | NRF | mpi#proc.in | yes | 1x | 6x | 11x | 21x | 40x | 6x | 11x | 21x | 34x |
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
Stable_2Aug2023
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.51E+08 | 6.96E+08 | 1.30E+09 | 2.34E+09 | 4.05E+09 | 6.65E+08 | 1.21E+09 | 1.99E+09 | - |
LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 2x | 4x | 7x | 12x | 2x | 3x | 6x | - |
LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.85E+08 | 3.01E+08 | 5.62E+08 | 1.01E+09 | 1.67E+09 | 2.92E+08 | 5.41E+08 | 9.28E+08 | - |
LAMMPS [EAM] | NRF | EAM | yes | 1x | 2x | 3x | 6x | 10x | 2x | 3x | 5x | - |
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.37E+06 | 5.66E+06 | 1.05E+07 | 1.76E+07 | 2.76E+07 | 5.70E+06 | 1.01E+07 | 1.70E+07 | 1.95E+07 |
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 5x | 11x | 19x | 30x | 5x | 11x | 18x | 21x |
LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 5.09E+05 | 2.21E+06 | 4.40E+06 | 8.73E+06 | 1.66E+07 | 2.08E+06 | 4.21E+06 | 8.21E+06 | 1.58E+07 |
LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 6x | 10x | 20x | 38x | 5x | 10x | 19x | 36x |
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.03E+08 | 5.49E+08 | 1.02E+09 | 1.83E+09 | 3.16E+09 | 5.29E+08 | 9.28E+08 | 1.51E+09 | 1.58E+09 |
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 6x | 12x | 21x | 36x | 5x | 11x | 17x | 18x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 2,035 | 1,188 | 625 | 358 | 2,090 | 1,119 | 612 |
MILC | NRF | Apex Medium | yes | 1x | 14x | 24x | 45x | 78x | 13x | 25x | 46x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 176 | 350 | 687 | 1,381 | 173 | 340 | 687 | 1,370 |
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 3x | 5x | 11x | 21x | 3x | 5x | 11x | 21x |
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 180 | 358 | 714 | 1,421 | 178 | 350 | 704 | 1,409 |
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 3x | 5x | 11x | 22x | 3x | 5x | 11x | 22x |
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 220 | 436 | 867 | 1,734 | 216 | 424 | 850 | 1,705 |
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 3x | 6x | 12x | 24x | 3x | 6x | 12x | 24x |
NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 6.58 | 15 | 29 | 58 | 116 | 14 | 28 | 57 | 114 |
NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 2x | 4x | 9x | 18x | 2x | 4x | 9x | 17x |
NAMD [stmv_nptsr_cuda] | ns/day | stmv_nptsr_cuda | yes | 6.71 | 15 | 30 | 59 | 119 | 15 | 29 | 58 | 117 |
NAMD [stmv_nptsr_cuda] | NRF | stmv_nptsr_cuda | yes | 1x | 2x | 4x | 9x | 18x | 2x | 4x | 9x | 17x |
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 17 | 34 | 69 | 137 | 17 | 33 | 67 | 135 |
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 2x | 5x | 10x | 20x | 2x | 5x | 10x | 19x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A100 SXM4 80GB | 2x A100 SXM4 80GB | 4x A100 SXM4 80GB | 8x A100 SXM4 80GB | 1x A100 PCIe 80GB | 2x A100 PCIe 80GB | 4x A100 PCIe 80GB | 8x A100 PCIe 80GB |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 77 | 40 | 21 | 13 | 79 | 41 | 22 | 15 |
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 4x | 11x | 20x | 33x | 4x | 11x | 20x | 30x |
Detailed A30 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 89 | 180 | 356 | 732 | ||||||||
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 9x | 18x | 35x | 72x | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 91 | 183 | 365 | 743 | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 9x | 18x | 35x | 71x | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 408 | 822 | 1,625 | 3,334 | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 7x | 14x | 27x | 56x | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 415 | 841 | 1,669 | 3,401 | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 8x | 15x | 30x | 62x | ||||||||
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 1,514 | 3,005 | 5,974 | 12,461 | ||||||||
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 6x | 13x | 25x | 53x | ||||||||
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 1,540 | 3,053 | 6,171 | 12,483 | ||||||||
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 7x | 13x | 27x | 54x | ||||||||
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 29 | 58 | 116 | 231 | ||||||||
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 10x | 19x | 39x | 78x | ||||||||
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 97 | 194 | 388 | 775 | ||||||||
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 4x | 8x | 15x | 30x |
AMBER is measured by running multiple independent instances using MPS
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 186 | 97 | 49 | 26 | 17 | ||||||||
Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 2x | 5x | 9x | 14x | ||||||||
Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 257 | 142 | 73 | 39 | 24 | ||||||||
Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 2x | 6x | 10x | 17x | ||||||||
Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 735 | - | 201 | 106 | 59 | ||||||||
Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | - | 5x | 10x | 18x | ||||||||
Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 1,088 | - | - | 152 | 83 | ||||||||
Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 10x | 18x | ||||||||
Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 3,117 | - | - | - | 235 | ||||||||
Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | - | 19x |
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 287 | 533 | 1,050 | 1,872 | ||||||||
GTC | NRF | mpi#proc.in | yes | 1x | 3x | 6x | 13x | 22x |
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
Stable_2Aug2023
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.37E+06 | 3.09E+06 | 5.86E+06 | 1.04E+07 | 1.40E+07 | ||||||||
LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 3x | 5x | 11x | 15x | ||||||||
LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 5.09E+05 | 1.12E+06 | 2.23E+06 | 4.43E+06 | 8.55E+06 | ||||||||
LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 2x | 6x | 10x | 19x | ||||||||
LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.03E+08 | 2.67E+08 | 4.97E+08 | 8.58E+08 | 1.10E+09 | ||||||||
LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 3x | 5x | 10x | 13x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 4,701 | 2,030 | 1,084 | 713 | ||||||||
MILC | NRF | Apex Medium | yes | 1x | 6x | 14x | 26x | 39x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | - | 183 | 367 | 728 | ||||||||
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | - | 3x | 6x | 11x | ||||||||
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | - | 188 | 376 | 746 | ||||||||
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | - | 3x | 6x | 11x | ||||||||
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 111 | 222 | 445 | 886 | ||||||||
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 2x | 3x | 6x | 12x | ||||||||
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | - | - | 34 | 69 | ||||||||
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | - | - | 5x | 10x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A30 | 2x A30 | 4x A30 | 8x A30 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 159 | 81 | 42 | 23 | ||||||||
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 2x | 4x | 10x | 19x |
Detailed A40 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.4-AT_23.4
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.14 | 97 | 197 | 397 | 819 | ||||||||
AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 10x | 19x | 39x | 81x | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.44 | 99 | 200 | 403 | 839 | ||||||||
AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 9x | 19x | 39x | 80x | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 59.60 | 491 | 996 | 1,988 | 4,028 | ||||||||
AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 8x | 17x | 33x | 68x | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 55.29 | 503 | 1,014 | 2,040 | 4,248 | ||||||||
AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 9x | 18x | 37x | 77x | ||||||||
AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 234.70 | 1,925 | 3,884 | 7,769 | 16,230 | ||||||||
AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 8x | 17x | 33x | 69x | ||||||||
AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 232.72 | 1,952 | 3,977 | 8,037 | 16,580 | ||||||||
AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 8x | 17x | 35x | 71x | ||||||||
AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 2.98 | 32 | 63 | 127 | 254 | ||||||||
AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 11x | 21x | 43x | 85x | ||||||||
AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.62 | 119 | 238 | 475 | 950 | ||||||||
AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 5x | 9x | 19x | 37x |
AMBER is measured by running multiple independent instances using MPS
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 189 | 314 | 625 | 1,113 | 2,534 | ||||||||
GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 3x | 6x | 13x |
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTC | Mpush/Sec | mpi#proc.in | yes | 89 | 303 | 553 | 1,088 | 1,926 | ||||||||
GTC | NRF | mpi#proc.in | yes | 1x | 3x | 7x | 13x | 23x |
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_a2f9e61
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MILC | Total Time (sec) | Apex Medium | no | 31,577 | 6,005 | 3,094 | 1,701 | 1,034 | ||||||||
MILC | NRF | Apex Medium | yes | 1x | 5x | 9x | 17x | 27x |
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3.b04
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 64.49 | 105 | 211 | 423 | 845 | ||||||||
NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 2x | 3x | 7x | 13x | ||||||||
NAMD [apoa1_nptsr_cuda] | ns/day | apoa1_nptsr_cuda | yes | 65.19 | 109 | 221 | 441 | 885 | ||||||||
NAMD [apoa1_nptsr_cuda] | NRF | apoa1_nptsr_cuda | yes | 1x | 2x | 3x | 7x | 14x | ||||||||
NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 71.14 | 146 | 295 | 593 | 1,187 | ||||||||
NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 2x | 4x | 8x | 17x | ||||||||
NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 6.97 | 11 | 21 | 42 | 85 | ||||||||
NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 2x | 3x | 6x | 12x |
NAMD is measured by running multiple independent instances using MPS
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_d2105bb
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
Application | Metric | Test Modules | Bigger is better | Dual Sapphire Rapids 8480+ (CPU-Only) | 1x A40 | 2x A40 | 4x A40 | 8x A40 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 386 | 203 | 103 | 53 | 34 | ||||||||
SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 2x | 3x | 8x | 13x |