For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: Cellulose, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2 (CPU) | 11.8 (GPU)


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3 (Intel CPU 20.12-AT_21.12)

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.341472945881,1761452895791,157
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x34x68x135x271x33x67x133x267x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.371593186361,2721583156301,261
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x36x73x146x291x36x72x144x289x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.245121,0252,0504,1005081,0172,0344,067
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x23x46x92x184x23x46x91x183x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.745561,1122,2254,4505511,1022,2034,407
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x24x49x98x196x24x48x97x194x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,1672,3334,6679,3341,1702,3394,6789,357
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x25x50x99x12x25x50x100x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,2212,4414,8839,7651,2472,4934,9869,973
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x26x51x102x13x26x52x105x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.435410721442853107214428
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x37x75x150x300x37x75x150x299x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.971322655301,0591352715421,084
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x12x24x48x97x12x25x49x99x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4955228161154281613
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x12x22x39x55x11x22x39x49x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes68337-626-332-530-
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x-12x-7x-10x-
GROMACS [Cellulose]ns/dayCelluloseyes209814425029097133169183
GROMACS [Cellulose]NRFCelluloseyes1x6x11x18x21x6x10x12x13x
GROMACS [STMV]ns/daySTMVyes423405911223395880
GROMACS [STMV]NRFSTMVyes1x5x10x15x28x5x10x14x20x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354899231,8033,5664808931,7483,436
GTCNRFmoi#proc.inyes1x14x27x53x104x14x26x51x100x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431317218158134318224165
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x8x11x15x18x8x11x15x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213293197144120291192140
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x8x11x15x18x8x12x16x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+085.04E+089.50E+081.73E+093.14E+095.10E+089.40E+081.61E+09-
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x9x17x30x5x9x15x-
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+072.84E+085.21E+088.99E+081.57E+092.82E+085.06E+088.20E+08-
LAMMPS [EAM]NRFEAMyes1x5x10x17x29x5x9x15x-
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+054.44E+068.31E+061.52E+072.44E+074.45E+068.33E+061.43E+071.83E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x17x31x58x92x17x32x54x69x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+052.08E+064.13E+068.19E+061.58E+072.00E+064.01E+067.82E+061.53E+07
LAMMPS [SNAP]NRFSNAPyes1x18x36x72x139x18x35x69x135x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+074.68E+088.72E+081.58E+092.78E+094.63E+088.24E+081.28E+09-
LAMMPS [Tersoff]NRFTersoffyes1x17x32x57x101x17x30x46x-

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno72,0422,1161,2336483732,1731,145660647
MILCNRFApex Mediumyes1x37x64x122x212x36x69x120x122x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.151743476961,3841733446911,381
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x9x18x36x72x9x18x36x72x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.591783607171,4211793577091,415
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x9x18x37x73x9x18x36x72x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.752184368681,7272164298561,720
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x11x21x42x83x10x21x41x83x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.87142754108132751107
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x15x29x58x7x14x27x57x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.81142856111142855110
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x8x15x31x61x8x15x30x61x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.94163264128163162127
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x8x17x33x66x8x16x32x65x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno718112714837114694941
Quantum EspresssoNRFAUSURF112-jRyes1x7x11x17x22x7x12x16x19x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,1101,7811,4581,3133,4011,9941,838-
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x9x10x4x6x7x-
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10822,96012,4168,5876,29925,27513,4149,5837,266
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x7x10x14x4x7x9x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,617178,568356,408714,38585,304169,986339,992679,229
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x31x63x8x15x30x60x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,96825,69251,157102,20012,88725,71251,428102,234
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x14x27x3x7x14x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,98927,52954,607108,89513,58527,03453,782107,190
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x4x7x14x28x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,8477741221477402215
SPECFEM3DNRFfour_material_simple_modelyes1x27x52x98x148x27x53x98x142x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | ICON Benchmark: QUBICC 160km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: Cellulose, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2 (CPU) | 11.8 (GPU)


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3 (Intel CPU 20.12-AT_21.12)

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3480161321643
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x19x37x74x148x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3784168336671
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x19x38x77x154x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.243326651,3292,659
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x15x30x60x120x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.743496981,3962,793
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x31x61x123x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.909081,8153,6317,262
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x19x39x77x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.419101,8213,6417,282
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x19x38x76x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.432958116231
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x20x40x81x162x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.9799199398795
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x9x18x36x72x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,115351811
ChromaNRFszscl21_24_128yes1x33x62x103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no495111562918
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x11x21x35x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes68190218377-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x4x7x-
GROMACS [Cellulose]ns/dayCelluloseyes205177116135
GROMACS [Cellulose]NRFCelluloseyes1x2x4x9x10x
GROMACS [STMV]ns/daySTMVyes412213453
GROMACS [STMV]NRFSTMVyes1x3x5x8x13x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352765231,0302,040
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431571354233206
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x7x10x12x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213502302193164
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x4x7x11x13x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+082.77E+085.34E+089.88E+081.42E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x10x14x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+071.36E+082.56E+084.60E+086.95E+08
LAMMPS [EAM]NRFEAMyes1x3x5x9x13x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+052.46E+064.73E+068.66E+061.28E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x18x33x49x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+051.07E+062.11E+064.21E+068.25E+06
LAMMPS [SNAP]NRFSNAPyes1x9x19x37x73x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+072.34E+084.09E+087.48E+089.89E+08
LAMMPS [Tersoff]NRFTersoffyes1x8x15x27x36x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno72,0424,9202,1311,145713
MILCNRFApex Mediumyes1x16x37x69x111x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1591182363728
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x10x19x38x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5994187372748
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x10x19x38x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.75111221442886
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x11x21x43x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.877142858
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x15x31x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.817153059
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x16x33x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.948163265
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x8x17x34x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7182391096850
Quantum EspresssoNRFAUSURF112-jRyes1x3x7x12x16x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,9222,2201,9051,676
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x6x7x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10835,02618,22111,8688,110
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x11x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,07087,787175,692350,683
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,70813,36626,73153,142
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x14x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,01913,93227,71355,159
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,847156804123
SPECFEM3DNRFfour_material_simple_modelyes1x14x26x51x92x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | GROMACS Benchmark: STMV, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.8


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3 (Intel CPU 20.12-AT_21.12)

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3489178357714
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x41x82x164x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3795190380760
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x22x43x87x174x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.244158301,6593,319
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x19x37x75x149x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.744358701,7403,480
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x19x38x77x153x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,0302,0614,1218,242
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x22x44x88x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,0612,1214,2438,486
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x44x89x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.433163126252
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x44x88x176x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.97119238476951
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x11x22x43x87x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,11578412213
ChromaNRFszscl21_24_128yes1x15x28x52x89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4952311175932
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x10x19x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes68279336472-
GROMACS [ADH Dodec]NRFADH Dodecyes1x6x7x9x-
GROMACS [Cellulose]ns/dayCelluloseyes2070110146163
GROMACS [Cellulose]NRFCelluloseyes1x3x8x11x12x
GROMACS [STMV]ns/daySTMVyes418365360
GROMACS [STMV]NRFSTMVyes1x4x9x13x15x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes352795231,0332,032
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431741420262223
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x6x9x11x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213747415253192
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x5x9x12x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+056.85E+051.32E+062.50E+064.18E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x2x3x9x16x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+052.43E+054.87E+059.75E+051.93E+06
LAMMPS [SNAP]NRFSNAPyes1x2x5x9x17x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+075.22E+071.03E+082.01E+083.49E+08
LAMMPS [Tersoff]NRFTersoffyes1x2x4x7x13x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno72,0425,8683,0191,721988
MILCNRFApex Mediumyes1x14x26x46x80x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.15103209418837
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x11x22x44x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.59109220440883
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x11x22x45x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.751452935861,175
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x28x57x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.878153060
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x16x32x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.818163264
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x9x18x35x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9410203979
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x20x41x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,8102,1651,8181,678
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x6x7x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10825,42911,9938,3856,005
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x8x11x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31831,014
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,617
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7736,403
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,8472031035329
SPECFEM3DNRFfour_material_simple_modelyes1x10x20x40x73x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.8 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.8 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.8

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.8 | GROMACS Benchmark: Cellulose, CUDA Version: 11.8 | LAMMPS Benchmark: SNAP, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.8

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2 (CPU) | 11.8 (GPU)


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3 (Intel CPU 20.12-AT_21.12)

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.349118236372776152304607
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x42x84x167x17x35x70x140x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.379619238476894188377754
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x22x44x88x176x22x43x86x172x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.243927841,5683,1353406801,3592,718
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x35x70x141x15x31x61x122x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.744188361,6733,3453587161,4322,863
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x18x37x74x147x16x31x63x126x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,0042,0074,0148,0299751,9503,8997,798
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x21x43x86x10x21x42x83x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,0592,1184,2378,4731,0262,0524,1038,206
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x44x89x11x22x43x86x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.433162124249224488176
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x44x87x174x15x31x62x123x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.97119239478955119237475949
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x11x22x44x87x11x22x43x87x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,115165311710142281513
ChromaNRFszscl21_24_128yes1x7x37x68x111x8x41x77x85x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no4959950261587452314
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x24x41x6x14x26x43x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes68219255461-222253314228265320
GROMACS [ADH Dodec]NRFADH Dodecyes1x4x5x9x-4x5x6x4x5x6x
GROMACS [Cellulose]ns/dayCelluloseyes206092149-5382-639198
GROMACS [Cellulose]NRFCelluloseyes1x3x5x11x-3x5x-3x5x6x
GROMACS [STMV]ns/daySTMVyes415274352122431152736
GROMACS [STMV]NRFSTMVyes1x3x6x11x13x3x5x7x3x6x9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352685121,0201,9972965561,0981,813
GTCNRFmoi#proc.inyes1x8x15x30x58x9x16x32x53x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431591353223167819578248
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x7x11x15x3x4x10x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213514304192143697438215
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x4x7x12x16x3x5x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_23Jun2022_update1

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+083.52E+086.64E+081.29E+092.34E+093.56E+086.46E+081.17E+091.89E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x6x12x23x3x6x11x18x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.39E+071.24E+082.66E+085.37E+089.61E+081.27E+082.65E+085.07E+088.19E+08
LAMMPS [EAM]NRFEAMyes1x2x5x10x18x2x5x9x15x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.48E+052.95E+065.66E+061.07E+071.84E+073.10E+065.84E+061.09E+071.76E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x11x21x41x70x12x22x41x67x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.13E+051.41E+062.83E+065.68E+061.13E+071.40E+062.79E+065.60E+061.11E+07
LAMMPS [SNAP]NRFSNAPyes1x12x25x50x99x12x25x49x98x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.75E+072.74E+085.02E+089.69E+081.78E+092.85E+085.10E+089.65E+081.55E+09
LAMMPS [Tersoff]NRFTersoffyes1x10x18x35x64x10x18x35x56x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno72,0425,0342,4341,2707123,8852,0821,1371,059
MILCNRFApex Mediumyes1x16x33x62x111x20x38x70x75x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1511122344789067134267533114228457904
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x6x12x23x46x4x7x14x28x6x12x24x47x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5911623446893571142283564119237474945
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x12x24x48x4x7x14x29x6x12x24x48x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.751412855701,146901803607211442885721,138
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x27x55x4x9x17x35x7x14x28x55x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.87917346851021429183570
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x5x9x18x36x3x6x11x22x5x10x19x37x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.81918367151122439183673
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x5x10x20x39x3x6x12x24x5x10x20x40x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9410204079612265110204080
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x20x41x3x6x13x26x5x10x21x41x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
WRFSeconds / TimestampsConus_2.5k_JAno60.620.68
WRFNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V7.0 CPU; V7.1 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno71827113587603371409276
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x9x13x2x6x9x10x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,5692,173-1,6733,5592,190-1,718
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x6x-8x4x6x-7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,108-17,79712,1108,46334,19118,47212,3938,695
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-5x7x11x3x5x7x10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,16876,056152,093304,13946,00191,749183,356366,874
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x32x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,52116,81232,95865,2439,22118,33136,28172,384
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,16314,19828,18456,2468,49816,85333,50866,865
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,847159824326131683624
SPECFEM3DNRFfour_material_simple_modelyes1x13x26x49x81x16x31x59x88x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.8

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.6 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.8

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.8 | Gromacs Benchmark: ADH Dodec, CUDA Version: 11.8 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.8 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.8


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

22.0-AT_22.3 (Intel CPU 20.12-AT_21.12)

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3452105210
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x12x24x48x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3753106213
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x12x24x49x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.242595181,037
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x12x23x47x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.742605211,042
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x11x23x46x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.909581,9153,831
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x20x41x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.419551,9113,822
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x20x40x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.43183773
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x13x26x51x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.9799197394
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x9x18x36x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1151174026
ChromaNRFszscl21_24_128yes1x10x28x44x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no49528714474
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x3x8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes68127230-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x5x-
GROMACS [Cellulose]ns/dayCelluloseyes20386174
GROMACS [Cellulose]NRFCelluloseyes1x2x3x4x
GROMACS [STMV]ns/daySTMVyes4101727
GROMACS [STMV]NRFSTMVyes1x2x4x6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35232463853
GTCNRFmoi#proc.inyes1x7x13x25x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,431971549404
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x4x6x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,213954531352
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x4x6x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/gauge-action-quda_16a2d47119

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno72,0427,3353,8012,098
MILCNRFApex Mediumyes1x11x21x38x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1557115231
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x12x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.5960120240
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x6x12x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.7577153305
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x7x15x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.874918
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x5x9x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.815918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x3x5x10x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9451021
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x5x11x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
WRFSeconds / TimestampsConus_2.5k_JAno5.510.90
WRFNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,3772,5671,815
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x5x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10830,65118,10011,239
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31822,06044,16888,386
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x8x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7735,88111,66823,111
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x6x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,773-9,82519,610
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x-3x5x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_fef2ace9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,84723912264
SPECFEM3DNRFfour_material_simple_modelyes1x9x17x33x