For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.


Detailed H200 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

24-AT_24

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL1x H2002x H2004x H2008x H200
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.402945871,1813276521,3332,664
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x28x56x114x31x63x128x256x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.432995961,1993306691,3952,782
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x29x57x115x32x64x134x267x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes82.111,2732,5435,1271,4062,8525,69012,468
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x16x31x62x17x35x69x152x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes90.621,3002,5905,2441,4302,8975,86311,854
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x14x29x58x16x32x65x131x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes358.074,3188,64415,8564,6899,48519,47937,687
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x44x13x26x54x105x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes365.314,4628,78916,3184,8519,69219,75938,246
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x45x13x27x54x105x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes3.288717334794187375749
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x26x53x106x29x57x114x228x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes23.081843687352004007991,599
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x8x16x32x9x17x35x69x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2024.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only) 1x H200 NVL 2x H200 NVL 4x H200 NVL 1x H200 4x H200 8x H200
Chroma Final Timestep Time (Sec) HMC Medium no 9,240 161 94 59 154 53 36
Chroma NRF HMC Medium yes 1x 59x 100x 159x 61x 177x 265x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only) 1x H200 NVL 2x H200 NVL 4x H200 NVL 1x H200 2x H200 4x H200 8x H200
Fun3D [dpw_wbt0_crs-3.6Mn_5] Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 127 25 15 9 24 14 9 7
Fun3D [dpw_wbt0_crs-3.6Mn_5] NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 7x 11x 17x 7x 11x 18x 22x
Fun3D [waverider-5M] Loop Time (Sec) waverider-5M no 179 37 21 13 35 20 12 9
Fun3D [waverider-5M] NRF waverider-5M yes 1x 8x 13x 22x 8x 14x 23x 30x
Fun3D [waverider-5M w/chemistry] Loop Time (Sec) waverider-5M w/chemistry no 498 111 60 35 100 54 32 21
Fun3D [waverider-5M w/chemistry] NRF waverider-5M w/chemistry yes 1x 7x 12x 21x 7x 14x 23x 35x
Fun3D [waverider-20M] Loop Time (Sec) waverider-20M no 682 - - 44 - - 42 26
Fun3D [waverider-20M] NRF waverider-20M yes 1x - - 21x - - 22x 35x
Fun3D [waverider-20M w/chemistry] Loop Time (Sec) waverider-20M w/chemistry no 2,155 - - 135 - - 123 72
Fun3D [waverider-20M w/chemistry] NRF waverider-20M w/chemistry yes 1x - - 23x - - 26x 44x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2024.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL8x H200 NVL1x H2002x H2004x H2008x H200
GROMACS [ADH Dodec]ns/dayADH Dodecyes3707561,3722,5115,0177921,5352,6245,275
GROMACS [ADH Dodec]NRFADH Dodecyes1x2x4x7x14x2x4x7x14x
GROMACS [STMV]ns/daySTMVyes194067112-4373123179
GROMACS [STMV]NRFSTMVyes1x2x4x7x-2x4x8x12x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5 updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL8x H200 NVL1x H2002x H2004x H2008x H200
GTCMpush/Secmpi#proc.inyes1367521,4002,7154,7298211,5322,9995,408
GTCNRFmpi#proc.inyes1x6x11x21x37x6x12x23x42x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2024.8_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 96541x H200 NVL1x H2002x H2004x H200
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno45918214310279
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x3x3x4x6x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Aug2024

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL8x H200 NVL1x H2002x H2004x H2008x H200
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes3.28E+081.31E+092.47E+093.69E+09-1.44E+092.71E+094.77E+097.95E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x4x8x12x-4x9x15x25x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes1.33E+085.26E+081.01E+091.70E+09-5.72E+081.10E+091.97E+093.22E+09
LAMMPS [EAM]NRFEAMyes1x4x8x14x-4x9x16x26x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.84E+061.09E+071.94E+073.01E+072.83E+071.17E+072.09E+073.39E+075.07E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x16x24x23x9x17x27x41x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.53E+063.46E+066.91E+061.38E+072.70E+073.81E+067.58E+061.51E+073.01E+07
LAMMPS [SNAP]NRFSNAPyes1x2x6x10x20x3x7x11x23x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.99E+089.34E+081.74E+093.04E+094.95E+091.02E+091.92E+093.48E+095.89E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x10x18x29x5x11x21x35x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_cde2498

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL1x H2002x H2004x H2008x H200
MILCTotal Time (sec)Apex Mediumno16,5701,028561350981534305191
MILCNRFApex Mediumyes1x14x26x42x15x28x48x77x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only) 1x H200 NVL 2x H200 NVL 4x H200 NVL 8x H200 NVL 1x H200 2x H200 4x H200 8x H200
NAMD [LaINDY ColVars] ns/day LaINDY ColVars yes 44.89 85 169 324 650 91 180 358 711
NAMD [LaINDY ColVars] NRF LaINDY ColVars yes 1x 2x 4x 7x 14x 2x 4x 8x 16x
NAMD [apoa1_nve_cuda] ns/day apoa1_nve_cuda yes 97.16 360 721 1,425 2,754 396 785 1,569 3,105
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 4x 7x 15x 28x 4x 8x 16x 32x
NAMD [stmv_npt_cuda] ns/day stmv_npt_cuda yes 10.06 24 48 95 190 26 52 103 207
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 2x 5x 9x 19x 3x 5x 10x 21x
NAMD [COVID-19 Spike Assembly] ns/day COVID-19 Spike Assembly yes 0.78 3 5 8 - 3 6 11 18
NAMD [COVID-19 Spike Assembly] NRF COVID-19 Spike Assembly yes 1x 4x 6x 10x - 4x 8x 15x 23x
NAMD [stmv_nve_cuda] ns/day stmv_nve_cuda yes 10.49 29 58 117 233 32 65 129 258
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 3x 6x 11x 22x 3x 6x 12x 25x

NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350


Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.3

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only) 2x H200 NVL 4x H200 NVL 8x H200 NVL 2x H200 4x H200 8x H200
Quantum Espressso Total CPU Time (Sec) GRIR443 no 687 143 107 65 172 106 79
Quantum Espressso NRF GRIR443 yes 1x 9x 12x 19x 7x 12x 16x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

5.0 beta-2 (a0b145a)

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL1x H2002x H200
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no10,5712,0921,2652,0241,201
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x5x8x5x9x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2024_01

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H200 NVL2x H200 NVL4x H200 NVL8x H200 NVL1x H2002x H2004x H2008x H200
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes21,047184,956368,654736,7131,476,608194,141385,616770,6721,545,937
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x9x18x35x70x9x18x37x73x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes7,21329,00857,948115,648230,77531,58162,562125,334250,342
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x4x8x16x32x4x9x17x35x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes7,21328,69557,103113,542226,99630,52759,893119,536238,880
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x8x16x31x4x8x17x33x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

4.1.1

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only) 1x H200 NVL 2x H200 NVL 4x H200 NVL 8x H200 NVL 1x H200 2x H200 4x H200 8x H200
SPECFEM3D Total Time (Sec) four_material_simple_model no 199 41 22 12 9 38 21 12 9
SPECFEM3D NRF four_material_simple_model yes 1x 4x 10x 18x 25x 4x 11x 19x 26x


Detailed GH200 96GB application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

24-AT_24

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.403051,296
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x29x125x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.433071,302
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x29x125x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes82.111,3395,510
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x16x67x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes90.621,3705,642
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x62x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes358.074,82718,286
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x51x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes365.314,91618,673
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x51x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes3.28101-
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x31x-
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes23.08205-
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x9x-

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2024.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
ChromaFinal Timestep Time (Sec)HMC Mediumno9,24016461
ChromaNRFHMC Mediumyes1x58x155x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no1272410
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x7x17x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno1793613
Fun3D [waverider-5M]NRFwaverider-5Myes1x8x21x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno49810538
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x7x19x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno682-48
Fun3D [waverider-20M]NRFwaverider-20Myes1x-19x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno2,155-138
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x-23x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2024.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes3708343,293
GROMACS [ADH Dodec]NRFADH Dodecyes1x2x9x
GROMACS [STMV]ns/daySTMVyes1947120
GROMACS [STMV]NRFSTMVyes1x2x8x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5 updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
GTCMpush/Secmpi#proc.inyes1368122,874
GTCNRFmpi#proc.inyes1x6x22x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2024.8_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno575175108
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160 km resolutionno45914781
ICON [QUBICC 160 km resolution]NRFQUBICC 160 km resolutionyes1x3x6x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Aug2024

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes3.28E+081.56E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes1.33E+086.10E+08
LAMMPS [EAM]NRFEAMyes1x5x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.84E+061.14E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.53E+063.83E+06
LAMMPS [SNAP]NRFSNAPyes1x3x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.99E+081.08E+09
LAMMPS [Tersoff]NRFTersoffyes1x6x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_cde2498

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
MILCTotal Time (sec)Apex Mediumno16,570935306
MILCNRFApex Mediumyes1x16x48x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better AMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
NAMD [LaINDY ColVars] ns/day LaINDY ColVars yes 44.89 114 441
NAMD [LaINDY ColVars] NRF LaINDY ColVars yes 1x 3x 10x
NAMD [apoa1_nve_cuda] ns/day apoa1_nve_cuda yes 97.16 392 1,505
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 4x 15x
NAMD [stmv_npt_cuda] ns/day stmv_npt_cuda yes 10.06 26 102
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 3x 10x
NAMD [COVID-19 Spike Assembly] ns/day COVID-19 Spike Assembly yes 0.78 3 11
NAMD [COVID-19 Spike Assembly] NRF COVID-19 Spike Assembly yes 1x 4x 14x
NAMD [stmv_nve_cuda] ns/day stmv_nve_cuda yes 10.49 32 126
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 3x 12x

NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350


RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2024_01

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes21,047178,321708,595
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x34x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes7,21331,584124,223
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x4x17x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes7,21329,320115,804
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x16x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

4.1.1

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x GH200 96GB4x GH200 96GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1994113
SPECFEM3DNRFfour_material_simple_modelyes1x4x18x


Detailed H100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

24-AT_24

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.402765521,1073086161,2622,476
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x27x53x106x30x59x121x238x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.432805601,1203146291,2692,595
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x27x54x107x30x60x122x249x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes82.111,2252,4654,9821,3352,6645,39711,295
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x15x30x61x16x32x66x138x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes90.621,2082,5125,0071,3652,7405,60611,840
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x13x28x55x15x30x62x131x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes358.074,2708,52816,7384,5739,28618,51536,090
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x47x13x26x52x101x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes365.314,3588,77017,1044,7299,39519,26538,119
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x47x13x26x53x104x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes3.288517134289178357713
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x26x52x104x27x54x109x217x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes23.081813627231933867711,543
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x8x16x31x8x17x33x67x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2024.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL1x H100 SXM4x H100 SXM8x H100 SXM
ChromaFinal Timestep Time (Sec)HMC Mediumno9,240197112692646340
ChromaNRFHMC Mediumyes1x48x84x137x36x150x234x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no127291710102716108
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x6x10x16x17x6x10x17x22x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno1794324141440221410
Fun3D [waverider-5M]NRFwaverider-5Myes1x7x12x20x19x7x13x21x29x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno498127673831116623623
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x6x11x19x24x6x12x20x32x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno682--5038--4628
Fun3D [waverider-20M]NRFwaverider-20Myes1x--18x24x--20x32x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno2,155--15198--14080
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x--21x32x--23x39x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2024.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
GROMACS [ADH Dodec]ns/dayADH Dodecyes3706841,4282,5115,1467161,5092,5895,256
GROMACS [ADH Dodec]NRFADH Dodecyes1x2x4x7x14x2x4x7x14x
GROMACS [STMV]ns/daySTMVyes19396594-4372120174
GROMACS [STMV]NRFSTMVyes1x2x3x6x-2x4x8x12x

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5 updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
GTCMpush/Secmpi#proc.inyes1367461,3982,6574,6517691,4362,8055,235
GTCNRFmpi#proc.inyes1x6x11x21x36x6x11x22x41x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Aug2024

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes3.28E+081.14E+091.88E+093.27E+095.81E+091.33E+092.49E+094.44E+097.56E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x6x10x19x4x8x14x24x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes1.33E+084.86E+08-1.52E+09-5.34E+081.03E+091.85E+093.06E+09
LAMMPS [EAM]NRFEAMyes1x4x-12x-4x8x15x24x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes1.84E+069.83E+061.68E+072.89E+074.08E+071.09E+071.95E+073.20E+074.84E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x7x14x23x33x9x16x26x39x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.53E+063.26E+065.88E+061.17E+072.56E+073.72E+067.43E+061.48E+072.95E+07
LAMMPS [SNAP]NRFSNAPyes1x2x5x9x19x3x6x11x22x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes1.99E+088.87E+081.46E+092.63E+094.81E+099.99E+081.86E+093.37E+095.73E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x9x16x28x5x11x20x34x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_cde2498

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
MILCTotal Time (sec)Apex Mediumno16,5701,2636823793011,173632356216
MILCNRFApex Mediumyes1x12x22x39x49x13x23x41x68x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes87.872735501,1062,2092995961,1812,300
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x13x25x3x7x13x26x
NAMD [LaINDY ColVars]ns/dayLaINDY ColVarsyes44.898416332764989178350698
NAMD [LaINDY ColVars]NRFLaINDY ColVarsyes1x2x4x7x14x2x4x8x16x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes97.163416941,3772,8083877641,5192,981
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x7x14x29x4x8x16x31x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes10.062346931862550100200
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x5x9x19x2x5x10x20x
NAMD [COVID-19 Spike Assembly]ns/dayCOVID-19 Spike Assemblyyes0.78347-361118
NAMD [COVID-19 Spike Assembly]NRFCOVID-19 Spike Assemblyyes1x4x6x9x-4x8x14x23x
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes10.4927551102223161123245
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x5x10x21x3x6x12x23x

NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350


RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

5.0 beta-2 (a0b145a)

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no10,5711,9991,2581,0442,0451,2091,055
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x5x8x10x5x9x10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2024_01

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes21,047148,536281,767567,6791,173,655157,252313,545625,2421,250,439
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x7x13x27x56x7x15x30x59x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes7,21323,64446,24993,247187,00230,82461,529122,504244,246
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x6x13x26x4x9x17x34x
RTM [TTI RX 2Pass mgpu]Mcell/sTTI RX 2Pass mgpuyes7,21323,09444,68989,743181,40526,71153,090105,394210,086
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x3x6x12x25x4x7x15x29x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

4.1.1

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x H100 NVL2x H100 NVL4x H100 NVL8x H100 NVL1x H100 SXM2x H100 SXM4x H100 SXM8x H100 SXM
SPECFEM3DTotal Time (Sec)four_material_simple_modelno199502614946241410
SPECFEM3DNRFfour_material_simple_modelyes1x3x9x16x24x4x9x17x23x


Detailed L40S application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

24-AT_24

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S8x L40S
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes82.119621,9153,9208,163
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x12x23x48x99x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes90.629901,9914,0018,330
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x11x22x44x92x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes358.074,0928,31816,72432,399
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x23x47x90x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes365.314,2088,46517,12533,569
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x23x47x92x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes3.2872144287575
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x44x88x175x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes23.081913817621,525
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x8x17x33x66x

AMBER is measured by running multiple independent instances using MPS


Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2024.10

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)2x L40S8x L40S
ChromaFinal Timestep Time (Sec)HMC Mediumno9,240367155
ChromaNRFHMC Mediumyes1x26x61x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

14.0.1

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)2x L40S4x L40S8x L40S
Fun3D [dpw_wbt0_crs-3.6Mn_5]Loop Time (Sec)dpw_wbt0_crs-3.6Mn_5no127-3419
Fun3D [dpw_wbt0_crs-3.6Mn_5]NRFdpw_wbt0_crs-3.6Mn_5yes1x-5x9x
Fun3D [waverider-5M]Loop Time (Sec)waverider-5Mno179854525
Fun3D [waverider-5M]NRFwaverider-5Myes1x3x6x11x
Fun3D [waverider-5M w/chemistry]Loop Time (Sec)waverider-5M w/chemistryno49824112770
Fun3D [waverider-5M w/chemistry]NRFwaverider-5M w/chemistryyes1x2x6x10x
Fun3D [waverider-20M]Loop Time (Sec)waverider-20Mno682-17897
Fun3D [waverider-20M]NRFwaverider-20Myes1x-5x9x
Fun3D [waverider-20M w/chemistry]Loop Time (Sec)waverider-20M w/chemistryno2,155--295
Fun3D [waverider-20M w/chemistry]NRFwaverider-20M w/chemistryyes1x--11x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2024.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S8x L40S
GROMACS [ADH Dodec]ns/dayADH Dodecyes3706181,4482,5245,188
GROMACS [ADH Dodec]NRFADH Dodecyes1x2x4x7x14x
GROMACS [STMV]ns/daySTMVyes194370104-
GROMACS [STMV]NRFSTMVyes1x2x4x6x-

GROMACS [ADH Dodec] is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5 updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S8x L40S
GTCMpush/Secmpi#proc.inyes1364357961,5752,997
GTCNRFmpi#proc.inyes1x3x6x12x23x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_cde2498

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S
MILCTotal Time (sec)Apex Mediumno16,5704,0532,0521,337
MILCNRFApex Mediumyes1x4x7x11x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

3

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S8x L40S
NAMD [apoa1_npt_cuda]ns/dayapoa1_npt_cudayes87.872304579001,816
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x5x10x21x
NAMD [LaINDY ColVars]ns/dayLaINDY ColVarsyes44.89-127251501
NAMD [LaINDY ColVars]NRFLaINDY ColVarsyes1x-3x6x11x
NAMD [apoa1_nve_cuda]ns/dayapoa1_nve_cudayes97.163005991,1772,357
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x3x6x12x24x
NAMD [stmv_npt_cuda]ns/daystmv_npt_cudayes10.06173569138
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x3x7x14x
NAMD [COVID-19 Spike Assembly]ns/dayCOVID-19 Spike Assemblyyes0.78235-
NAMD [COVID-19 Spike Assembly]NRFCOVID-19 Spike Assemblyyes1x2x4x6x-
NAMD [stmv_nve_cuda]ns/daystmv_nve_cudayes10.49224488176
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x2x4x8x17x

NAMD is measured by running multiple independent instances using MPS except NAMD [COVID-19 Spike Assembly] dataset
Trifan A, Gorgun D, Salim M, et al. Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action. The International Journal of High Performance Computing Applications. 2022;36(5-6):603-623. doi:10.1177/10943420221113513
D. B. Sauer, N. Trebesch, J. J. Marden, N. Cocco, J. Song, A. Koide, S. Koide, E. Tajkhorshid, and D.-N. Wang. "Structural basis for the reaction cycle of DASS dicarboxylate transporters." eLife. 9, e61350 (2020). https://doi.org/10.7554/eLife.61350


RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2024_01

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L40S2x L40S4x L40S8x L40S
RTM [Isotropic Radius 4]Mcell/sIsotropic Radius 4yes21,04742,36484,430168,059336,140
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x8x16x
RTM [TTI Radius 8 1-pass]Mcell/sTTI Radius 8 1-passyes7,21314,64628,94157,186114,228
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x8x16x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

4.1.1

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)8x L40S
SPECFEM3DTotal Time (Sec)four_material_simple_modelno19923
SPECFEM3DNRFfour_material_simple_modelyes1x10x


Detailed L4 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

24-AT_24

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)1x L4 2x L44x L48x L4
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes10.4055109220440
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x5x10x21x42x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes10.4356111220442
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x5x11x21x42x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes82.112665361,0652,145
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x3x7x13x26x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes90.622725441,0932,231
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x3x6x12x25x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes358.071,2812,5195,14410,383
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x4x7x14x29x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes365.311,2802,5675,17610,395
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x4x7x14x28x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes3.28214183166
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x6x13x25x51x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes23.08113226451902
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x5x10x20x39x

AMBER is measured by running multiple independent instances using MPS


GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V4.5 updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)4x L48x L4
GTCMpush/Secmpi#proc.inyes1366571,244
GTCNRFmpi#proc.inyes1x5x10x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_cde2498

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterAMD Dual Genoa 9654 (CPU-Only)2x L44x L48x L4
MILCTotal Time (sec)Apex Mediumno16,5705,8733,0001,618
MILCNRFApex Mediumyes1x3x5x9x