For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB |FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.4.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.4.2 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.4.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.4.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.4.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.4.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.4.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.4.2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.401513026031,2071472945891,178
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x34x69x137x274x33x67x134x268x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.411633276531,3061623246481,295
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x37x74x148x296x37x73x147x294x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.805361,0722,1444,2885321,0642,1284,255
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x24x47x94x188x23x47x93x187x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.135691,1382,2764,5525761,1522,3044,609
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x25x49x98x197x25x50x100x199x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes97.511,2002,3994,7999,5981,2092,4174,8359,670
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x25x49x98x12x25x50x99x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes99.021,2722,5445,08910,1781,2782,5565,11310,225
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x26x51x103x13x26x52x103x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.425511122144253106213426
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x39x78x156x311x37x75x150x300x

Benchmarked in ensemble mode

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5245228161154291612
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x13x24x41x57x12x23x40x53x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes56329-477-327-494-
GROMACS [ADH Dodec]NRFADH Dodecyes1x8x-11x-8x-12x-
GROMACS [Cellulose]ns/dayCelluloseyes169613922624896130169187
GROMACS [Cellulose]NRFCelluloseyes1x9x13x21x23x9x12x16x18x
GROMACS [STMV]ns/daySTMVyes422385611222385477
GROMACS [STMV]NRFSTMVyes1x6x11x16x31x6x11x15x22x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354899231,8033,5664808931,7483,436
GTCNRFmoi#proc.inyes1x14x27x53x104x14x26x51x100x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno80139761411
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x6x9x11x13x6x7x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14625151082617
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x6x10x15x18x6x9x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+085.54E+081.04E+091.93E+093.61E+095.25E+089.61E+081.76E+092.76E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x10x19x35x5x9x17x27x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+072.82E+085.10E+088.86E+081.50E+092.70E+084.81E+088.13E+081.26E+09
LAMMPS [EAM]NRFEAMyes1x6x10x17x29x5x9x16x25x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+053.06E+065.32E+068.73E+061.23E+073.04E+065.30E+068.55E+061.14E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x15x26x43x61x15x26x43x57x
LAMMPS [Rhodopsin]ATOM-Time Steps/sRhodopsinyes7.34E+06-1.92E+073.13E+074.71E+07-1.86E+072.80E+073.20E+07
LAMMPS [Rhodopsin]NRFRhodopsinyes1x-3x5x7x-2x4x5x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+052.09E+064.16E+068.21E+061.58E+072.05E+064.09E+068.11E+061.56E+07
LAMMPS [SNAP]NRFSNAPyes1x18x36x72x138x18x36x71x137x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+074.61E+088.27E+081.46E+092.24E+094.44E+087.82E+081.33E+091.84E+09
LAMMPS [Tersoff]NRFTersoffyes1x17x30x52x80x16x28x48x66x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_d98de0c4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno67,3662,2601,3197003942,3741,373660624
MILCNRFApex Mediumyes1x33x56x106x188x31x54x112x119x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.12121243490974121227479961
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x6x13x26x51x6x12x25x50x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.63126250498995121244483974
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x13x25x51x6x12x25x50x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641643266491,3101603246471,275
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x8x16x31x63x8x16x31x62x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.781224479512234692
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x13x27x54x6x13x26x52x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.801224499712244794
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x7x13x27x54x7x13x26x52x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.95142855110142754107
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x7x14x28x56x7x14x27x55x

Benchmarked in ensemble mode

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

CPU 6.7; GPU 6.8

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7651358458491479278
Quantum EspresssoNRFAUSURF112-jRyes1x6x10x15x17x6x9x11x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,1101,7811,4581,3133,4011,9941,8381,735
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x5x8x10x11x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40422,96012,4168,5876,29925,27513,4149,5837,266
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x9x14x19x4x9x12x16x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,560178,509357,150714,21389,576178,629357,258714,486
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x32x63x8x16x32x63x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,85325,43350,556101,16712,93025,79551,541102,613
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x13x27x3x7x14x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,94027,58054,758108,32213,71627,22154,185107,829
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x15x29x4x7x14x29x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0a5acff9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9917740221478412215
SPECFEM3DNRFfour_material_simple_modelyes1x30x56x104x159x29x56x103x151x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.4.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.4.2 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.4.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.4.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.4.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.4.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.4.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.4.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4081162324648
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x18x37x74x147x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.4186171342685
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x19x39x78x155x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.803336661,3332,665
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x15x29x58x117x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.133537071,4142,827
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x31x61x122x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes97.519021,8043,6087,216
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x9x19x37x74x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes99.029461,8923,7837,566
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x19x38x76x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.422958117234
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x21x41x82x165x

Benchmarked in ensemble mode

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,115351811
ChromaNRFszscl21_24_128yes1x33x62x103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no524111563018
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x22x36x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes56191-354-
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x-8x-
GROMACS [Cellulose]ns/dayCelluloseyes165175112127
GROMACS [Cellulose]NRFCelluloseyes1x3x5x10x12x
GROMACS [STMV]ns/daySTMVyes412213351
GROMACS [STMV]NRFSTMVyes1x3x5x9x14x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352765231,0302,040
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802416--
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x--
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14647282319
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x5x6x8x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+082.65E+085.03E+089.42E+081.72E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x9x17x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+071.29E+082.43E+084.44E+087.59E+08
LAMMPS [EAM]NRFEAMyes1x3x5x9x15x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+051.68E+063.11E+065.45E+068.11E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x8x15x27x40x
LAMMPS [Rhodopsin]ATOM-Time Steps/sRhodopsin7.34E+06-1.16E+071.96E+07-
LAMMPS [Rhodopsin]NRFRhodopsin1x-2x3x-
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+051.07E+062.13E+064.24E+068.19E+06
LAMMPS [SNAP]NRFSNAPyes1x9x19x37x72x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+072.24E+084.19E+087.58E+081.10E+09
LAMMPS [Tersoff]NRFTersoffyes1x8x15x27x39x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_d98de0c4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno67,3665,4512,3131,167691
MILCNRFApex Mediumyes1x14x32x63x107x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1273146291582
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x4x8x15x30x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6376151302601
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x4x8x15x31x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.6492184369737
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x9x18x36x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.786132550
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x7x14x28x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.806132651
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x7x14x28x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.957142856
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x7x15x29x

Benchmarked in ensemble mode

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

CPU 6.7; GPU 6.8

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno76528712396
Quantum EspresssoNRFAUSURF112-jRyes1x3x7x9x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,9222,2201,9051,676
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40435,02618,22111,8688,110
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x10x14x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,06887,787175,548350,754
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,68813,22326,35952,547
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x14x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7736,97713,87627,52054,789
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0a5acff9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991156804223
SPECFEM3DNRFfour_material_simple_modelyes1x15x28x54x98x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.4.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.4.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.4.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.4.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.4.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.4.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.4.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4090181361722
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x41x82x164x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.4196192384768
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x22x44x87x174x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.804138261,6523,305
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x36x72x145x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.134388751,7503,500
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x19x38x76x151x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes97.511,0072,0154,0308,059
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x21x41x83x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes99.021,0592,1194,2378,474
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x21x43x86x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.423263126252
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x44x89x178x

Benchmarked in ensemble mode

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,11578412213
ChromaNRFszscl21_24_128yes1x15x28x52x89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5242261155932
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x11x20x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes56303317485-
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x8x12x-
GROMACS [Cellulose]ns/dayCelluloseyes167191143-
GROMACS [Cellulose]NRFCelluloseyes1x5x9x13x-
GROMACS [STMV]ns/daySTMVyes417355162
GROMACS [STMV]NRFSTMVyes1x5x10x14x17x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes352795231,0332,032
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802918-16
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x4x-5x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14666362721
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x4x5x7x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+08-2.15E+084.10E+087.72E+08
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x-2x4x7x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+07-9.98E+071.88E+083.51E+08
LAMMPS [EAM]NRFEAMyes1x-2x4x7x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+05-7.50E+051.39E+062.38E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x-3x5x12x
LAMMPS [Rhodopsin]ATOM-Time Steps/sRhodopsinyes7.34E+06--1.26E+071.92E+07
LAMMPS [Rhodopsin]NRFRhodopsinyes1x--2x3x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+052.45E+054.90E+059.80E+051.95E+06
LAMMPS [SNAP]NRFSNAPyes1x2x5x9x17x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+074.45E+078.78E+071.70E+083.19E+08
LAMMPS [Tersoff]NRFTersoffyes1x2x3x6x11x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_d98de0c4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno67,3666,0583,2231,641933
MILCNRFApex Mediumyes1x12x23x45x79x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1286171344691
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x4x9x18x36x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6391182365733
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x9x19x37x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641302605251,058
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x6x13x25x51x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.787142958
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x16x32x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.807153060
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x17x33x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.959183774
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x9x19x38x

Benchmarked in ensemble mode

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,8102,1651,8181,678
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40425,42911,9938,3856,005
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x10x14x19x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31830,995
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,625
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7736,406
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0a5acff9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9912041045430
SPECFEM3DNRFfour_material_simple_modelyes1x11x22x42x77x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.4.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.4.2 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.4.2

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.4.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.4.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.4.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.4.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.409619138376699197395789
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x22x44x87x174x22x45x90x179x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.41103205410821105211421843
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x23x47x93x186x24x48x96x191x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.804238451,6903,3804338661,7333,465
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x19x37x74x148x19x38x76x152x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.134519031,8063,6114649281,8563,711
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x20x39x78x156x20x40x80x160x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes97.511,1362,2724,5449,0881,1702,3404,6809,360
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x23x47x93x12x24x48x96x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes99.021,1932,3854,7719,5421,2472,4944,9889,976
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x48x96x13x25x50x101x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.4233651302613367133267
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x23x46x92x183x23x47x94x188x

Benchmarked in ensemble mode

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,115165311710142281513
ChromaNRFszscl21_24_128yes1x7x37x68x111x8x41x77x85x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5249950261588452414
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x6x13x25x43x7x15x28x47x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes56224259435-226250314234-325
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x6x10x-5x6x8x6x-8x
GROMACS [Cellulose]ns/dayCelluloseyes166191145-537586638596
GROMACS [Cellulose]NRFCelluloseyes1x4x9x14x-3x5x8x5x6x9x
GROMACS [STMV]ns/daySTMVyes415274351122232142636
GROMACS [STMV]NRFSTMVyes1x4x7x12x14x3x6x9x4x7x10x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352685121,0201,9972965561,0981,813
GTCNRFmoi#proc.inyes1x8x15x30x58x9x16x32x53x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno80241598211410
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x8x10x4x6x8x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14646261511392315
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x6x10x13x4x6x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+083.36E+086.40E+081.22E+092.31E+093.26E+086.06E+081.14E+092.04E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x6x12x22x3x6x11x20x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+071.19E+082.61E+085.34E+089.66E+081.21E+082.56E+084.99E+088.80E+08
LAMMPS [EAM]NRFEAMyes1x2x5x10x19x2x5x10x17x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+051.86E+063.45E+066.14E+069.78E+061.91E+063.54E+066.38E+069.88E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x17x31x49x9x18x32x49x
LAMMPS [Rhodopsin]ATOM-Time Steps/sRhodopsinyes7.34E+06-1.17E+072.17E+073.64E+07-1.21E+072.10E+07-
LAMMPS [Rhodopsin]NRFRhodopsinyes1x-2x3x5x-2x3x-
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+051.46E+062.90E+065.73E+061.13E+071.40E+062.79E+065.57E+061.10E+07
LAMMPS [SNAP]NRFSNAPyes1x13x25x50x99x12x24x49x96x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+072.21E+083.89E+087.91E+081.43E+092.32E+084.35E+088.08E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x8x14x28x51x8x16x29x43x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_d98de0c4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno67,3665,3042,5331,3317454,1302,2151,2121,160
MILCNRFApex Mediumyes1x14x29x56x99x18x33x61x64x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.12941863747446011923847497194384771
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x10x20x39x3x6x12x25x5x10x20x40x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.639719639178363127253507101202402801
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x10x20x40x3x6x13x26x5x10x20x41x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641282565121,030851703386751312545131,028
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x6x12x25x50x4x8x16x33x6x12x25x50x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.78815316151020408163264
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x9x17x34x3x6x11x23x5x9x18x36x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.80816326351121428173365
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x9x18x35x3x6x12x23x5x9x18x36x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.95918367061225509183673
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x9x18x36x3x6x13x25x5x9x19x37x

Benchmarked in ensemble mode

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
NV-WRFgSeconds / TimestampsConus_2.5k_JAno60.620.68
NV-WRFgNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

CPU 6.7; GPU 6.8

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno765293153997329516311293
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x9x12x3x5x8x9x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,5692,173-1,6733,5592,190-1,718
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x-9x4x7x-9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,404-17,79712,1108,46334,19118,47212,3938,695
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-6x10x14x3x5x9x13x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,15876,043152,116304,08546,04692,037183,565366,995
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x32x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,56616,91332,99265,2759,24618,26936,29172,463
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,18214,24528,26156,4168,51216,88133,54866,986
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0a5acff9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991158824325132683722
SPECFEM3DNRFfour_material_simple_modelyes1x14x28x52x90x17x33x62x104x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.4.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.4.2 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.4.2

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.4.2 | Gromacs Benchmark: ADH Dodec, CUDA Version: 11.4.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.4.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.4.2


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.10

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4063126251
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x14x29x57x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.4164129257
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x15x29x58x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.803086161,232
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x14x27x54x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.133176351,270
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x14x27x55x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes97.511,1412,2824,564
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x23x47x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes99.021,1792,3584,715
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x48x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.42224487
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x15x31x62x

Benchmarked in ensemble mode

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1151174026
ChromaNRFszscl21_24_128yes1x10x28x44x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no52428514574
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x4x9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.3

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes56125235-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x6x-
GROMACS [Cellulose]ns/dayCelluloseyes16396169
GROMACS [Cellulose]NRFCelluloseyes1x2x4x5x
GROMACS [STMV]ns/daySTMVyes4101725
GROMACS [STMV]NRFSTMVyes1x2x4x7x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35232463853
GTCNRFmoi#proc.inyes1x7x13x25x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno803621-
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x4x-
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno146814331
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x3x5x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_d98de0c4

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno67,3667,4593,8542,626
MILCNRFApex Mediumyes1x10x19x28x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1255110218
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x11x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6357115227
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x6x12x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.6474148292
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x7x14x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.784917
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x5x10x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.804918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x2x5x10x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9551020
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x5x10x

Benchmarked in ensemble mode

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.510.90
NV-WRFgNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,3772,5671,815
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x6x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40430,65118,10011,239
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x6x10x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31822,00144,04388,054
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x8x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7735,90611,54122,948
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x6x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,773-9,79919,565
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x-3x5x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0a5acff9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,99124012265
SPECFEM3DNRFfour_material_simple_modelyes1x9x19x35x