NVIDIA HPC Application Performance
For Deep Learning performance, please go here.
Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.
The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.
Detailed H200 application performance data is located below in alphabetical order.
AMBER
 
  Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
24-AT_24
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 11.71 | 327 | 652 | 1,333 | 2,664 | 293 | 588 | 1,176 | 2,359 | 
| AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 28x | 56x | 114x | 227x | 25x | 50x | 100x | 201x | 
| AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 11.69 | 330 | 669 | 1,395 | 2,782 | 299 | 596 | 1,193 | 2,398 | 
| AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 28x | 57x | 119x | 238x | 26x | 51x | 102x | 205x | 
| AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 93.36 | 1,406 | 2,852 | 5,690 | 12,468 | 1,263 | 2,527 | 5,055 | 10,180 | 
| AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 15x | 31x | 61x | 134x | 14x | 27x | 54x | 109x | 
| AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 99.50 | 1,430 | 2,897 | 5,863 | 11,854 | 1,289 | 2,581 | 5,226 | 10,422 | 
| AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 14x | 29x | 59x | 119x | 13x | 26x | 53x | 105x | 
| AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 377.04 | 4,689 | 9,485 | 19,479 | 37,687 | 4,250 | 8,422 | 17,056 | 31,382 | 
| AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 12x | 25x | 52x | 100x | 11x | 22x | 45x | 83x | 
| AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 397.04 | 4,851 | 9,692 | 19,759 | 38,246 | 4,337 | 8,640 | 17,269 | 32,541 | 
| AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 12x | 24x | 50x | 96x | 11x | 22x | 43x | 82x | 
| AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 3.69 | 94 | 187 | 375 | 749 | 91 | 182 | 364 | 728 | 
| AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 25x | 51x | 102x | 203x | 25x | 49x | 99x | 197x | 
| AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.07 | 200 | 400 | 799 | 1,599 | 182 | 364 | 728 | 1,456 | 
| AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 8x | 16x | 32x | 64x | 7x | 15x | 29x | 58x | 
AMBER is measured by running multiple independent instances using MPS
Chroma
 
  Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V2025.01
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chroma | Final Timestep Time (Sec) | HMC Medium | no | 10,037 | 153 | 88 | 53 | 35 | 160 | 93 | 59 | 46 | 
| Chroma | NRF | HMC Medium | yes | 1x | 65x | 116x | 193x | 289x | 63x | 110x | 175x | 224x | 
FUN3D
 
  Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 112 | 24 | 14 | 9 | 7 | 25 | 15 | 9 | 8 | 
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 5x | 10x | 16x | 20x | 4x | 10x | 15x | 18x | 
| Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 154 | 33 | 19 | 11 | 8 | 35 | 20 | 11 | 9 | 
| Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 5x | 13x | 23x | 31x | 4x | 12x | 21x | 26x | 
| Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 474 | 91 | 48 | 26 | 16 | 97 | 51 | 28 | 19 | 
| Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 5x | 15x | 27x | 44x | 5x | 14x | 25x | 36x | 
| Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 628 | - | - | 36 | 20 | - | - | 38 | 25 | 
| Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 23x | 41x | - | - | 22x | 34x | 
| Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 2,011 | - | - | 102 | 54 | - | - | 109 | 65 | 
| Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | 29x | 55x | - | - | 27x | 45x | 
GROMACS
 
  Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
h-bond - 2025-rc
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 362 | 857 | 1,627 | 2,673 | 5,330 | 773 | 1,450 | 2,700 | 5,430 | 
| GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 4x | 7x | 15x | 2x | 4x | 7x | 15x | 
| GROMACS [STMV] | ns/day | STMV | yes | 20 | 44 | 76 | 131 | 198 | 41 | 70 | 123 | 153 | 
| GROMACS [STMV] | NRF | STMV | yes | 1x | 2x | 4x | 8x | 13x | 2x | 3x | 7x | 10x | 
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
 
  Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GTC | Mpush/Sec | mpi#proc.in | yes | 146 | 821 | 1,532 | 2,999 | 5,408 | 749 | 1,407 | 2,691 | 4,780 | 
| GTC | NRF | mpi#proc.in | yes | 1x | 6x | 11x | 22x | 40x | 5x | 10x | 20x | 35x | 
ICON
 
  Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2024.8_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| ICON [SLAM 191 - 160KM - no radiation] | Integrate_nh (sec) | SLAM 191 levels 160 km resolution without radiation | no | 587 | 171 | 141 | 113 | 98 | 180 | 148 | 116 | 
| ICON [SLAM 191 - 160KM - no radiation] | NRF | SLAM 191 levels 160 km resolution without radiation | yes | 1x | 3x | 4x | 5x | 6x | 3x | 4x | 5x | 
| ICON [QUBICC 160 km resolution] | Integrate_nh (sec) | QUBICC 160 km resolution | no | 466 | 143 | 102 | 79 | 67 | 150 | 107 | 82 | 
| ICON [QUBICC 160 km resolution] | NRF | QUBICC 160 km resolution | yes | 1x | 3x | 5x | 6x | 7x | 3x | 4x | 6x | 
LAMMPS
 
  Molecular Dynamics
Classical molecular dynamics package
VERSION
patch_4Feb2025
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.95E+08 | 1.44E+09 | 2.69E+09 | 4.72E+09 | 7.80E+09 | 1.32E+09 | 2.45E+09 | 3.78E+09 | 6.33E+09 | 
| LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 4x | 7x | 13x | 21x | 3x | 6x | 10x | 17x | 
| LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.44E+08 | 5.75E+08 | 1.09E+09 | 1.95E+09 | 3.17E+09 | 5.28E+08 | 1.00E+09 | 1.70E+09 | 2.52E+09 | 
| LAMMPS [EAM] | NRF | EAM | yes | 1x | 4x | 8x | 14x | 23x | 4x | 7x | 12x | 18x | 
| LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.93E+06 | 1.15E+07 | 2.05E+07 | 3.33E+07 | 4.96E+07 | 1.06E+07 | 1.91E+07 | 2.98E+07 | 4.32E+07 | 
| LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 6x | 16x | 26x | 38x | 6x | 15x | 23x | 33x | 
| LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 1.61E+06 | 4.24E+06 | 8.49E+06 | 1.69E+07 | 3.36E+07 | 3.88E+06 | 7.71E+06 | 1.53E+07 | 3.05E+07 | 
| LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 3x | 7x | 12x | 24x | 2x | 6x | 11x | 22x | 
| LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 2.21E+08 | 1.03E+09 | 1.91E+09 | 3.46E+09 | 5.89E+09 | 9.43E+08 | 1.75E+09 | 3.04E+09 | 4.93E+09 | 
| LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 5x | 10x | 18x | 31x | 4x | 9x | 16x | 26x | 
MILC
 
  Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_cde2498
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MILC | Total Time (sec) | Apex Medium | no | 13,735 | 981 | 534 | 305 | 191 | 1,018 | 580 | 334 | 263 | 
| MILC | NRF | Apex Medium | yes | 1x | 14x | 23x | 40x | 64x | 13x | 21x | 37x | 46x | 
NAMD
 
  Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NAMD [LaINDY ColVars] | ns/day | LaINDY ColVars | yes | 50.56 | 89 | 177 | 352 | 698 | 84 | 164 | 327 | 651 | 
| NAMD [LaINDY ColVars] | NRF | LaINDY ColVars | yes | 1x | 2x | 4x | 7x | 14x | 2x | 3x | 6x | 13x | 
| NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 108.79 | 392 | 784 | 1,545 | 3,017 | 357 | 700 | 1,414 | 2,804 | 
| NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 4x | 7x | 14x | 28x | 3x | 6x | 13x | 26x | 
| NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 10.53 | 25 | 51 | 102 | 203 | 23 | 46 | 93 | 185 | 
| NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 2x | 5x | 10x | 19x | 2x | 4x | 9x | 18x | 
| NAMD [COVID-19 Spike Assembly] | ns/day | COVID-19 Spike Assembly | yes | 0.75 | 3 | 6 | 11 | 18 | 3 | 5 | 8 | - | 
| NAMD [COVID-19 Spike Assembly] | NRF | COVID-19 Spike Assembly | yes | 1x | 4x | 8x | 15x | 24x | 4x | 6x | 11x | - | 
| NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 10.87 | 32 | 64 | 128 | 257 | 29 | 58 | 116 | 232 | 
| NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 3x | 6x | 12x | 24x | 3x | 5x | 11x | 21x | 
NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
      D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350
Quantum Espresso
 
  Material Science (Quantum Chemistry)
An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale
VERSION
V7.4
ACCELERATED FEATURES
- linear algebra (matrix multiply)
- explicit computational kernels
- 3D FFTs
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 2x H200 | 4x H200 | 8x H200 | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|
| Quantum Espressso | Total CPU Time (Sec) | GRIR443 | no | 784 | 114 | 89 | 50 | 116 | 77 | 54 | 
| Quantum Espressso | NRF | GRIR443 | yes | 1x | 12x | 16x | 28x | 12x | 19x | 26x | 
RELION
 
  Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
5.0.0
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| Relion [Plasmodium Ribosome] | Total Wall Clock (Sec) | MB numbers Plasmodium Ribosime on Relion-3.0 | no | 8,981 | 2,355 | 1,231 | 1,051 | 2,355 | 1,231 | 1,051 | 977 | 
| Relion [Plasmodium Ribosome] | NRF | MB numbers Plasmodium Ribosime on Relion-3.0 | yes | 1x | 4x | 7x | 9x | 4x | 7x | 9x | 9x | 
RTM
 
  Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2024_01
ACCELERATED FEATURES
- Batch algorithm
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RTM [Isotropic Radius 4] | Mcell/s | Isotropic Radius 4 | yes | 21,047 | 194,141 | 385,616 | 770,672 | 1,545,937 | 184,860 | 368,249 | 736,485 | 1,476,228 | 
| RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 9x | 18x | 37x | 73x | 9x | 17x | 35x | 70x | 
| RTM [TTI Radius 8 1-pass] | Mcell/s | TTI Radius 8 1-pass | yes | 7,213 | 31,581 | 62,562 | 125,334 | 250,342 | 25,816 | 51,604 | 103,104 | 205,718 | 
| RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 4x | 9x | 17x | 35x | 4x | 7x | 14x | 29x | 
| RTM [TTI RX 2Pass mgpu] | Mcell/s | TTI RX 2Pass mgpu | yes | 7,213 | 30,527 | 59,893 | 119,536 | 238,880 | 28,738 | 57,080 | 113,564 | 227,150 | 
| RTM [TTI RX 2Pass mgpu] | NRF | TTI RX 2Pass mgpu | yes | 1x | 4x | 8x | 17x | 33x | 4x | 8x | 16x | 31x | 
SPECFEM3D
 
  Geoscience
Simulates Seismic wave propagation
VERSION
4.1.1
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H200 | 2x H200 | 4x H200 | 8x H200 | 1x H200 NVL | 2x H200 NVL | 4x H200 NVL | 8x H200 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 186 | 38 | 21 | 12 | 9 | 41 | 22 | 12 | 8 | 
| SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 5x | 10x | 18x | 24x | 4x | 9x | 17x | 25x | 
Detailed GH200 96GB application performance data is located below in alphabetical order.
AMBER
 
  Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
24-AT_24
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 10.40 | 305 | 1,296 | 
| AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 29x | 125x | 
| AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 10.43 | 307 | 1,302 | 
| AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 29x | 125x | 
| AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 82.11 | 1,339 | 5,510 | 
| AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 16x | 67x | 
| AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 90.62 | 1,370 | 5,642 | 
| AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 15x | 62x | 
| AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 358.07 | 4,827 | 18,286 | 
| AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 13x | 51x | 
| AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 365.31 | 4,916 | 18,673 | 
| AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 13x | 51x | 
| AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 3.28 | 101 | - | 
| AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 31x | - | 
| AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 23.08 | 205 | - | 
| AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 9x | - | 
AMBER is measured by running multiple independent instances using MPS
Chroma
 
  Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V2024.10
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| Chroma | Final Timestep Time (Sec) | HMC Medium | no | 9,240 | 164 | 61 | 
| Chroma | NRF | HMC Medium | yes | 1x | 58x | 155x | 
FUN3D
 
  Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.0.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 127 | 24 | 10 | 
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 7x | 17x | 
| Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 179 | 36 | 13 | 
| Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 8x | 21x | 
| Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 498 | 105 | 38 | 
| Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 7x | 19x | 
| Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 682 | - | 48 | 
| Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | 19x | 
| Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 2,155 | - | 138 | 
| Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | 23x | 
GROMACS
 
  Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2024.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 370 | 834 | 3,293 | 
| GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 9x | 
| GROMACS [STMV] | ns/day | STMV | yes | 19 | 47 | 120 | 
| GROMACS [STMV] | NRF | STMV | yes | 1x | 2x | 8x | 
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
 
  Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| GTC | Mpush/Sec | mpi#proc.in | yes | 136 | 812 | 2,874 | 
| GTC | NRF | mpi#proc.in | yes | 1x | 6x | 22x | 
ICON
 
  Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2024.8_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| ICON [SLAM 191 - 160KM - no radiation] | Integrate_nh (sec) | SLAM 191 levels 160 km resolution without radiation | no | 575 | 175 | 108 | 
| ICON [SLAM 191 - 160KM - no radiation] | NRF | SLAM 191 levels 160 km resolution without radiation | yes | 1x | 3x | 5x | 
| ICON [QUBICC 160 km resolution] | Integrate_nh (sec) | QUBICC 160 km resolution | no | 459 | 147 | 81 | 
| ICON [QUBICC 160 km resolution] | NRF | QUBICC 160 km resolution | yes | 1x | 3x | 6x | 
LAMMPS
 
  Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_29Aug2024
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 
|---|---|---|---|---|---|
| LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.28E+08 | 1.56E+09 | 
| LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 5x | 
| LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.33E+08 | 6.10E+08 | 
| LAMMPS [EAM] | NRF | EAM | yes | 1x | 5x | 
| LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.84E+06 | 1.14E+07 | 
| LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 9x | 
| LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 1.53E+06 | 3.83E+06 | 
| LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 3x | 
| LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 1.99E+08 | 1.08E+09 | 
| LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 6x | 
MILC
 
  Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_cde2498
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| MILC | Total Time (sec) | Apex Medium | no | 16,570 | 935 | 306 | 
| MILC | NRF | Apex Medium | yes | 1x | 16x | 48x | 
NAMD
 
  Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| NAMD [LaINDY ColVars] | ns/day | LaINDY ColVars | yes | 44.89 | 114 | 441 | 
| NAMD [LaINDY ColVars] | NRF | LaINDY ColVars | yes | 1x | 3x | 10x | 
| NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 97.16 | 392 | 1,505 | 
| NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 4x | 15x | 
| NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 10.06 | 26 | 102 | 
| NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 3x | 10x | 
| NAMD [COVID-19 Spike Assembly] | ns/day | COVID-19 Spike Assembly | yes | 0.78 | 3 | 11 | 
| NAMD [COVID-19 Spike Assembly] | NRF | COVID-19 Spike Assembly | yes | 1x | 4x | 14x | 
| NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 10.49 | 32 | 126 | 
| NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 3x | 12x | 
NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
      D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350
RTM
 
  Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2024_01
ACCELERATED FEATURES
- Batch algorithm
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| RTM [Isotropic Radius 4] | Mcell/s | Isotropic Radius 4 | yes | 21,047 | 178,321 | 708,595 | 
| RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 8x | 34x | 
| RTM [TTI Radius 8 1-pass] | Mcell/s | TTI Radius 8 1-pass | yes | 7,213 | 31,584 | 124,223 | 
| RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 4x | 17x | 
| RTM [TTI RX 2Pass mgpu] | Mcell/s | TTI RX 2Pass mgpu | yes | 7,213 | 29,320 | 115,804 | 
| RTM [TTI RX 2Pass mgpu] | NRF | TTI RX 2Pass mgpu | yes | 1x | 4x | 16x | 
SPECFEM3D
 
  Geoscience
Simulates Seismic wave propagation
VERSION
4.1.1
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 1x GH200 96GB | 4x GH200 96GB | 
|---|---|---|---|---|---|---|
| SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 199 | 41 | 13 | 
| SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 4x | 18x | 
Detailed H100 application performance data is located below in alphabetical order.
AMBER
 
  Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
24-AT_24
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 11.71 | 308 | 616 | 1,262 | 2,476 | 281 | 555 | 1,109 | 2,456 | 
| AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 26x | 53x | 108x | 211x | 24x | 47x | 95x | 210x | 
| AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 11.69 | 314 | 629 | 1,269 | 2,595 | 285 | 563 | 1,125 | 2,367 | 
| AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 27x | 54x | 109x | 222x | 24x | 48x | 96x | 202x | 
| AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 93.36 | 1,335 | 2,664 | 5,397 | 11,295 | 1,236 | 2,454 | 4,898 | 9,766 | 
| AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 14x | 29x | 58x | 121x | 13x | 26x | 52x | 105x | 
| AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 99.50 | 1,365 | 2,740 | 5,606 | 11,840 | 1,254 | 2,513 | 5,246 | 9,974 | 
| AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 14x | 28x | 56x | 119x | 13x | 25x | 53x | 100x | 
| AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 377.04 | 4,573 | 9,286 | 18,515 | 36,090 | 4,239 | 8,453 | 17,804 | 32,754 | 
| AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 12x | 25x | 49x | 96x | 11x | 22x | 47x | 87x | 
| AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 397.04 | 4,729 | 9,395 | 19,265 | 38,119 | 4,293 | 8,528 | 17,029 | 33,107 | 
| AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 12x | 24x | 49x | 96x | 11x | 21x | 43x | 83x | 
| AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 3.69 | 89 | 178 | 357 | 713 | 92 | 184 | 368 | 736 | 
| AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 24x | 48x | 97x | 193x | 25x | 50x | 100x | 199x | 
| AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.07 | 193 | 386 | 771 | 1,543 | 181 | 362 | 723 | 1,446 | 
| AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 8x | 15x | 31x | 62x | 7x | 14x | 29x | 58x | 
AMBER is measured by running multiple independent instances using MPS
Chroma
 
  Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V2025.01
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chroma | Final Timestep Time (Sec) | HMC Medium | no | 10,037 | 261 | 106 | 63 | 40 | 190 | 109 | 68 | 49 | 
| Chroma | NRF | HMC Medium | yes | 1x | 38x | 96x | 164x | 256x | 53x | 94x | 151x | 209x | 
FUN3D
 
  Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | Loop Time (Sec) | dpw_wbt0_crs-3.6Mn_5 | no | 112 | 27 | 16 | 10 | 8 | 29 | 17 | 10 | 10 | 
| Fun3D [dpw_wbt0_crs-3.6Mn_5] | NRF | dpw_wbt0_crs-3.6Mn_5 | yes | 1x | 4x | 9x | 15x | 19x | 4x | 9x | 14x | 15x | 
| Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 154 | 38 | 21 | 12 | 8 | 40 | 22 | 12 | 10 | 
| Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 4x | 12x | 20x | 29x | 4x | 11x | 20x | 25x | 
| Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 474 | 104 | 54 | 29 | 18 | 110 | 58 | 30 | 20 | 
| Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 5x | 13x | 24x | 40x | 4x | 12x | 23x | 35x | 
| Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 628 | - | - | 41 | 23 | - | - | 43 | 26 | 
| Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | - | - | 20x | 37x | - | - | 19x | 33x | 
| Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 2,011 | - | - | 116 | 61 | - | - | 125 | 68 | 
| Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | - | 25x | 48x | - | - | 24x | 43x | 
GROMACS
 
  Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
h-bond - 2025-rc
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 362 | 823 | 1,540 | 2,700 | 5,295 | 767 | 1,432 | 2,625 | 5,326 | 
| GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 4x | 7x | 15x | 2x | 4x | 7x | 15x | 
| GROMACS [STMV] | ns/day | STMV | yes | 20 | 44 | 75 | 130 | 200 | 41 | 70 | 121 | 144 | 
| GROMACS [STMV] | NRF | STMV | yes | 1x | 2x | 4x | 8x | 13x | 2x | 3x | 7x | 9x | 
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
 
  Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GTC | Mpush/Sec | mpi#proc.in | yes | 146 | 769 | 1,436 | 2,805 | 5,235 | 741 | 1,396 | 2,679 | 4,819 | 
| GTC | NRF | mpi#proc.in | yes | 1x | 5x | 10x | 20x | 38x | 5x | 10x | 20x | 35x | 
LAMMPS
 
  Molecular Dynamics
Classical molecular dynamics package
VERSION
patch_4Feb2025
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LAMMPS [LJ 2.5] | ATOM-Time Steps/s | LJ 2.5 | yes | 3.95E+08 | 1.33E+09 | 2.47E+09 | 4.38E+09 | 7.42E+09 | 1.16E+09 | 1.90E+09 | 3.39E+09 | 6.04E+09 | 
| LAMMPS [LJ 2.5] | NRF | LJ 2.5 | yes | 1x | 3x | 6x | 12x | 20x | 3x | 5x | 9x | 16x | 
| LAMMPS [EAM] | ATOM-Time Steps/s | EAM | yes | 1.44E+08 | 5.34E+08 | 1.02E+09 | 1.82E+09 | 3.02E+09 | 5.10E+08 | 8.51E+08 | 1.49E+09 | 2.49E+09 | 
| LAMMPS [EAM] | NRF | EAM | yes | 1x | 4x | 7x | 13x | 22x | 4x | 6x | 11x | 18x | 
| LAMMPS [ReaxFF/C] | ATOM-Time Steps/s | ReaxFF/C | yes | 1.93E+06 | 1.07E+07 | 1.93E+07 | 3.15E+07 | 4.77E+07 | 9.49E+06 | 1.72E+07 | 2.89E+07 | 4.23E+07 | 
| LAMMPS [ReaxFF/C] | NRF | ReaxFF/C | yes | 1x | 6x | 15x | 24x | 37x | 5x | 13x | 22x | 33x | 
| LAMMPS [SNAP] | ATOM-Time Steps/s | SNAP | yes | 1.61E+06 | 4.16E+06 | 8.35E+06 | 1.65E+07 | 3.29E+07 | 3.65E+06 | 6.37E+06 | 1.20E+07 | 2.68E+07 | 
| LAMMPS [SNAP] | NRF | SNAP | yes | 1x | 3x | 7x | 12x | 24x | 2x | 5x | 9x | 19x | 
| LAMMPS [Tersoff] | ATOM-Time Steps/s | Tersoff | yes | 2.21E+08 | 1.00E+09 | 1.79E+09 | 3.35E+09 | 5.69E+09 | 8.68E+08 | 1.49E+09 | 2.84E+09 | - | 
| LAMMPS [Tersoff] | NRF | Tersoff | yes | 1x | 5x | 10x | 18x | 30x | 4x | 7x | 15x | - | 
MILC
 
  Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_cde2498
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MILC | Total Time (sec) | Apex Medium | no | 13,735 | 1,173 | 632 | 356 | 216 | 1,212 | 679 | 373 | 266 | 
| MILC | NRF | Apex Medium | yes | 1x | 12x | 19x | 34x | 57x | 11x | 18x | 33x | 46x | 
NAMD
 
  Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 100.72 | 299 | 596 | 1,181 | 2,300 | 273 | 550 | 1,106 | 2,209 | 
| NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 3x | 6x | 12x | 23x | 3x | 5x | 11x | 22x | 
| NAMD [LaINDY ColVars] | ns/day | LaINDY ColVars | yes | 50.56 | 87 | 174 | 346 | 689 | 84 | 162 | 325 | 646 | 
| NAMD [LaINDY ColVars] | NRF | LaINDY ColVars | yes | 1x | 2x | 3x | 7x | 14x | 2x | 3x | 6x | 13x | 
| NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 108.79 | 381 | 757 | 1,494 | 2,935 | 353 | 706 | 1,412 | 2,737 | 
| NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 4x | 7x | 14x | 27x | 3x | 6x | 13x | 25x | 
| NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 10.53 | 24 | 49 | 97 | 196 | 23 | 46 | 92 | 184 | 
| NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 2x | 5x | 9x | 19x | 2x | 4x | 9x | 18x | 
| NAMD [COVID-19 Spike Assembly] | ns/day | COVID-19 Spike Assembly | yes | 0.75 | 3 | 6 | 11 | 18 | 3 | 5 | 8 | - | 
| NAMD [COVID-19 Spike Assembly] | NRF | COVID-19 Spike Assembly | yes | 1x | 4x | 8x | 14x | 24x | 4x | 6x | 10x | - | 
| NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 10.87 | 31 | 62 | 123 | 247 | 29 | 57 | 114 | 227 | 
| NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 3x | 6x | 11x | 23x | 3x | 5x | 10x | 21x | 
NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
      D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350
RELION
 
  Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
5.0.0
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|
| Relion [Plasmodium Ribosome] | Total Wall Clock (Sec) | MB numbers Plasmodium Ribosime on Relion-3.0 | no | 8,981 | 2,137 | 1,288 | 1,059 | 1,005 | 2,458 | 1,219 | 1,035 | 
| Relion [Plasmodium Ribosome] | NRF | MB numbers Plasmodium Ribosime on Relion-3.0 | yes | 1x | 4x | 7x | 8x | 9x | 4x | 7x | 9x | 
RTM
 
  Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2024_01
ACCELERATED FEATURES
- Batch algorithm
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RTM [Isotropic Radius 4] | Mcell/s | Isotropic Radius 4 | yes | 21,047 | 157,252 | 313,545 | 625,242 | 1,250,439 | 153,662 | 292,630 | 589,562 | 1,214,770 | 
| RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 7x | 15x | 30x | 59x | 7x | 14x | 28x | 58x | 
| RTM [TTI Radius 8 1-pass] | Mcell/s | TTI Radius 8 1-pass | yes | 7,213 | 30,824 | 61,529 | 122,504 | 244,246 | 25,597 | 49,607 | 94,039 | 197,162 | 
| RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 4x | 9x | 17x | 34x | 4x | 7x | 13x | 27x | 
| RTM [TTI RX 2Pass mgpu] | Mcell/s | TTI RX 2Pass mgpu | yes | 7,213 | 26,711 | 53,090 | 105,394 | 210,086 | 23,978 | 46,001 | 92,576 | 186,835 | 
| RTM [TTI RX 2Pass mgpu] | NRF | TTI RX 2Pass mgpu | yes | 1x | 4x | 7x | 15x | 29x | 3x | 6x | 13x | 26x | 
SPECFEM3D
 
  Geoscience
Simulates Seismic wave propagation
VERSION
4.1.1
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x H100 SXM | 2x H100 SXM | 4x H100 SXM | 8x H100 SXM | 1x H100 NVL | 2x H100 NVL | 4x H100 NVL | 8x H100 NVL | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SPECFEM3D | Total Time (Sec) | four_material_simple_model | no | 186 | 46 | 24 | 14 | 10 | 50 | 26 | 14 | 9 | 
| SPECFEM3D | NRF | four_material_simple_model | yes | 1x | 4x | 9x | 16x | 22x | 4x | 6x | 15x | 24x | 
Detailed L40S application performance data is located below in alphabetical order.
AMBER
 
  Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
24-AT_24
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 11.71 | 179 | 356 | 728 | 1,582 | ||||||||
| AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 15x | 30x | 62x | 135x | ||||||||
| AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 11.69 | 183 | 372 | 739 | 1,580 | ||||||||
| AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 16x | 32x | 63x | 135x | ||||||||
| AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 93.36 | 977 | 2,004 | 4,017 | 8,935 | ||||||||
| AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 10x | 21x | 43x | 96x | ||||||||
| AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 99.50 | 1,020 | 2,060 | 4,166 | 9,026 | ||||||||
| AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 10x | 21x | 42x | 91x | ||||||||
| AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 377.04 | 4,150 | 8,389 | 17,112 | 35,769 | ||||||||
| AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 11x | 22x | 45x | 95x | ||||||||
| AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 397.04 | 4,240 | 8,706 | 17,762 | - | ||||||||
| AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 11x | 22x | 45x | - | ||||||||
| AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 3.69 | 74 | 148 | 296 | 592 | ||||||||
| AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 20x | 40x | 80x | 160x | ||||||||
| AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.07 | 194 | 388 | 776 | 1,552 | ||||||||
| AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 8x | 15x | 31x | 62x | 
AMBER is measured by running multiple independent instances using MPS
Chroma
 
  Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V2025.01
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Chroma | Final Timestep Time (Sec) | HMC Medium | no | 10,037 | 367 | 343 | 152 | ||||||||
| Chroma | NRF | HMC Medium | yes | 1x | 28x | 30x | 67x | 
FUN3D
 
  Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
14.1
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fun3D [waverider-5M] | Loop Time (Sec) | waverider-5M | no | 154 | 41 | 23 | ||||||||
| Fun3D [waverider-5M] | NRF | waverider-5M | yes | 1x | 6x | 10x | ||||||||
| Fun3D [waverider-5M w/chemistry] | Loop Time (Sec) | waverider-5M w/chemistry | no | 474 | 105 | 57 | ||||||||
| Fun3D [waverider-5M w/chemistry] | NRF | waverider-5M w/chemistry | yes | 1x | 7x | 12x | ||||||||
| Fun3D [waverider-20M] | Loop Time (Sec) | waverider-20M | no | 628 | 165 | 89 | ||||||||
| Fun3D [waverider-20M] | NRF | waverider-20M | yes | 1x | 5x | 9x | ||||||||
| Fun3D [waverider-20M w/chemistry] | Loop Time (Sec) | waverider-20M w/chemistry | no | 2,011 | - | 237 | ||||||||
| Fun3D [waverider-20M w/chemistry] | NRF | waverider-20M w/chemistry | yes | 1x | - | 12x | 
GROMACS
 
  Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
h-bond - 2025-rc
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GROMACS [ADH Dodec] | ns/day | ADH Dodec | yes | 362 | 640 | 1,353 | 2,712 | 5,520 | ||||||||
| GROMACS [ADH Dodec] | NRF | ADH Dodec | yes | 1x | 2x | 4x | 7x | 15x | ||||||||
| GROMACS [STMV] | ns/day | STMV | yes | 20 | 44 | 73 | 113 | - | ||||||||
| GROMACS [STMV] | NRF | STMV | yes | 1x | 2x | 4x | 6x | - | 
GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS
GTC
 
  Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GTC | Mpush/Sec | mpi#proc.in | yes | 146 | 439 | 726 | 1,583 | 3,007 | ||||||||
| GTC | NRF | mpi#proc.in | yes | 1x | 3x | 5x | 12x | 22x | 
MILC
 
  Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_cde2498
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MILC | Total Time (sec) | Apex Medium | no | 13,735 | 4,046 | 2,047 | 1,438 | ||||||||
| MILC | NRF | Apex Medium | yes | 1x | 3x | 6x | 8x | 
NAMD
 
  Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
3
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NAMD [apoa1_npt_cuda] | ns/day | apoa1_npt_cuda | yes | 100.72 | 230 | 457 | 900 | 1,816 | ||||||||
| NAMD [apoa1_npt_cuda] | NRF | apoa1_npt_cuda | yes | 1x | 2x | 5x | 9x | 18x | ||||||||
| NAMD [LaINDY ColVars] | ns/day | LaINDY ColVars | yes | 50.56 | 62 | 125 | 248 | 496 | ||||||||
| NAMD [LaINDY ColVars] | NRF | LaINDY ColVars | yes | 1x | 1x | 2x | 5x | 10x | ||||||||
| NAMD [apoa1_nve_cuda] | ns/day | apoa1_nve_cuda | yes | 108.79 | 300 | 597 | 1,200 | 2,354 | ||||||||
| NAMD [apoa1_nve_cuda] | NRF | apoa1_nve_cuda | yes | 1x | 3x | 5x | 11x | 22x | ||||||||
| NAMD [stmv_npt_cuda] | ns/day | stmv_npt_cuda | yes | 10.53 | 17 | 34 | 68 | 136 | ||||||||
| NAMD [stmv_npt_cuda] | NRF | stmv_npt_cuda | yes | 1x | 2x | 3x | 6x | 13x | ||||||||
| NAMD [stmv_nve_cuda] | ns/day | stmv_nve_cuda | yes | 10.87 | 23 | 46 | 92 | 183 | ||||||||
| NAMD [stmv_nve_cuda] | NRF | stmv_nve_cuda | yes | 1x | 2x | 4x | 8x | 17x | 
NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
          D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350
RTM
 
  Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2024_01
ACCELERATED FEATURES
- Batch algorithm
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L40S | 2x L40S | 4x L40S | 8x L40S | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RTM [Isotropic Radius 4] | Mcell/s | Isotropic Radius 4 | yes | 21,047 | 42,366 | 84,432 | 168,028 | 336,068 | ||||||||
| RTM [Isotropic Radius 4] | NRF | Isotropic Radius 4 | yes | 1x | 2x | 4x | 8x | 16x | ||||||||
| RTM [TTI Radius 8 1-pass] | Mcell/s | TTI Radius 8 1-pass | yes | 7,213 | 14,644 | 28,937 | 57,176 | 114,205 | ||||||||
| RTM [TTI Radius 8 1-pass] | NRF | TTI Radius 8 1-pass | yes | 1x | 2x | 4x | 8x | 16x | 
Detailed L4 application performance data is located below in alphabetical order.
AMBER
 
  Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
24-AT_24
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9684X (CPU-Only) | 1x L4 | 2x L4 | 4x L4 | 8x L4 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AMBER [PME-Cellulose_NPT_4fs] | ns/day | DC-Cellulose_NPT | yes | 11.71 | 55 | 109 | 220 | 440 | ||||||||
| AMBER [PME-Cellulose_NPT_4fs] | NRF | DC-Cellulose_NPT | yes | 1x | 5x | 9x | 19x | 38x | ||||||||
| AMBER [PME-Cellulose_NVE_4fs] | ns/day | DC-Cellulose_NVE | yes | 11.69 | 56 | 111 | 220 | 442 | ||||||||
| AMBER [PME-Cellulose_NVE_4fs] | NRF | DC-Cellulose_NVE | yes | 1x | 5x | 10x | 19x | 38x | ||||||||
| AMBER [PME-FactorIX_NPT_4fs] | ns/day | DC-FactorIX_NPT | yes | 93.36 | 266 | 536 | 1,065 | 2,145 | ||||||||
| AMBER [PME-FactorIX_NPT_4fs] | NRF | DC-FactorIX_NPT | yes | 1x | 3x | 6x | 11x | 23x | ||||||||
| AMBER [PME-FactorIX_NVE_4fs] | ns/day | DC-FactorIX_NVE | yes | 99.50 | 272 | 544 | 1,093 | 2,231 | ||||||||
| AMBER [PME-FactorIX_NVE_4fs] | NRF | DC-FactorIX_NVE | yes | 1x | 3x | 5x | 11x | 22x | ||||||||
| AMBER [PME-JAC_NPT_4fs] | ns/day | DC-JAC_NPT | yes | 377.04 | 1,281 | 2,519 | 5,144 | 10,383 | ||||||||
| AMBER [PME-JAC_NPT_4fs] | NRF | DC-JAC_NPT | yes | 1x | 3x | 7x | 14x | 28x | ||||||||
| AMBER [PME-JAC_NVE_4fs] | ns/day | DC-JAC_NVE | yes | 397.04 | 1,280 | 2,567 | 5,176 | 10,395 | ||||||||
| AMBER [PME-JAC_NVE_4fs] | NRF | DC-JAC_NVE | yes | 1x | 3x | 6x | 13x | 26x | ||||||||
| AMBER [PME-STMV_NPT_4fs] | ns/day | DC-STMV_NPT | yes | 3.69 | 21 | 41 | 83 | 166 | ||||||||
| AMBER [PME-STMV_NPT_4fs] | NRF | DC-STMV_NPT | yes | 1x | 6x | 11x | 22x | 45x | ||||||||
| AMBER [FEP-GTI_Complex 1fs] | ns/day | FEP-GTI_Complex | yes | 25.07 | 113 | 226 | 451 | 902 | ||||||||
| AMBER [FEP-GTI_Complex 1fs] | NRF | FEP-GTI_Complex | yes | 1x | 4x | 9x | 18x | 36x | 
AMBER is measured by running multiple independent instances using MPS
GTC
 
  Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V4.5 updated
ACCELERATED FEATURES
- Push, shift, and collision
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 4x L4 | 8x L4 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GTC | Mpush/Sec | mpi#proc.in | yes | 136 | 657 | 1,244 | ||||||||
| GTC | NRF | mpi#proc.in | yes | 1x | 5x | 10x | 
MILC
 
  Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
develop_cde2498
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
| Application | Metric | Test Modules | Bigger is better | AMD Dual Genoa 9654 (CPU-Only) | 2x L4 | 4x L4 | 8x L4 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MILC | Total Time (sec) | Apex Medium | no | 16,570 | 5,873 | 3,000 | 1,618 | ||||||||
| MILC | NRF | Apex Medium | yes | 1x | 3x | 5x | 9x |