For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.6

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.6 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.6 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.6

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.6 | GROMACS Benchmark: Cellulose, CUDA Version: 11.6 | LAMMPS Benchmark: SNAP, CUDA Version: 11.6 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.6 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.6

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.12

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.341523036061,2121543076141,228
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x35x70x140x279x35x71x141x283x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.371643286561,3131653316621,324
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x38x75x150x300x38x76x151x303x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.245341,0672,1344,2685171,0332,0664,133
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x24x48x96x192x23x46x93x186x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.745701,1402,2794,5585901,1802,3604,720
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x25x50x100x200x26x52x104x208x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,1782,3564,7129,4241,2632,5275,05310,106
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x25x50x100x13x27x54x108x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,2422,4844,9679,9341,2562,5125,02410,048
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x26x52x104x13x26x53x105x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.435511022144155110220440
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x39x77x154x308x38x77x154x307x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5245228161154281612
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x13x24x41x58x12x23x42x53x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes58331-598-324370548-
GROMACS [ADH Dodec]NRFADH Dodecyes1x8x-14x-7x8x13x-
GROMACS [Cellulose]ns/dayCelluloseyes179714525029195134169204
GROMACS [Cellulose]NRFCelluloseyes1x9x13x22x26x8x12x15x18x
GROMACS [STMV]ns/daySTMVyes423405911422395780
GROMACS [STMV]NRFSTMVyes1x6x10x15x30x6x10x15x21x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354899231,8033,5664808931,7483,436
GTCNRFmoi#proc.inyes1x14x27x53x104x14x26x51x100x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292326242177157331245200
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x7x9x13x15x7x9x11x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160km resolutionno2,304302215155139294209155
ICON [QUBICC 160 km resolution]NRFQUBICC 160km resolutionyes1x8x11x15x17x8x11x15x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021 CPU / stable_29Sep2021_update2 GPU

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+085.52E+081.04E+091.93E+093.61E+095.41E+081.01E+091.32E+09-
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x10x19x35x5x10x13x-
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+072.81E+085.09E+088.84E+081.50E+092.76E+084.96E+086.72E+08-
LAMMPS [EAM]NRFEAMyes1x6x10x17x30x5x10x13x-
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+053.05E+065.35E+068.82E+061.25E+073.08E+065.45E+068.17E+061.14E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x15x27x44x62x15x27x41x57x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+052.10E+064.17E+068.22E+061.57E+072.05E+064.11E+068.10E+061.57E+07
LAMMPS [SNAP]NRFSNAPyes1x18x37x72x137x18x36x71x137x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+074.64E+088.35E+081.52E+092.37E+094.50E+087.90E+088.88E+08-
LAMMPS [Tersoff]NRFTersoffyes1x17x30x54x85x16x28x32x-

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_3971e182

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno67,3662,2911,3427033992,3571,369676664
MILCNRFApex Mediumyes1x32x55x105x186x31x54x110x112x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.12122244489975124247494988
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x6x13x26x51x6x13x26x52x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.631252505019741262525041,009
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x13x25x50x6x13x26x51x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641643276571,3051643306541,312
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x8x16x32x63x8x16x32x64x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.781224479412244794
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x13x26x53x7x13x27x53x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.801224499512244997
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x7x13x27x53x7x13x27x54x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.95142754109142754109
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x7x14x27x56x7x14x28x56x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V6.7 CPU; V7.0 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno718118775342120755347
Quantum EspresssoNRFAUSURF112-jRyes1x7x10x15x19x7x11x15x17x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,1101,7811,4581,3133,4011,9941,8381,735
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x5x8x10x11x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40422,96012,4168,5876,29925,27513,4149,5837,266
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x7x11x15x4x7x10x13x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,414178,194356,418712,73185,271170,060339,993679,883
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x31x63x8x15x30x60x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,82825,46250,863100,87112,86425,68851,313102,502
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x13x27x3x7x14x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,91027,43754,323107,53313,58927,04953,755107,254
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x4x7x14x28x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_44e098a3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9207640221477402116
SPECFEM3DNRFfour_material_simple_modelyes1x29x54x101x155x28x55x102x135x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.6

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | ICON Benchmark: QUBICC 160km resolution, CUDA Version: 11.6 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.6 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.6

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.6 | GROMACS Benchmark: Cellulose, CUDA Version: 11.6 | LAMMPS Benchmark: SNAP, CUDA Version: 11.6 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.6 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.6

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.12

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3483166332664
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x19x38x76x153x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3786173346691
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x20x40x79x158x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.243396791,3572,714
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x15x31x61x122x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.743607191,4382,876
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x16x32x63x126x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.909381,8753,7517,501
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x20x40x80x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.419681,9373,8747,747
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x20x41x81x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.433060120240
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x21x42x84x168x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,115351811
ChromaNRFszscl21_24_128yes1x33x62x103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no524111552918
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x22x37x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes58187216370457
GROMACS [ADH Dodec]NRFADH Dodecyes1x4x5x8x10x
GROMACS [Cellulose]ns/dayCelluloseyes175275118145
GROMACS [Cellulose]NRFCelluloseyes1x3x5x10x13x
GROMACS [STMV]ns/daySTMVyes412213455
GROMACS [STMV]NRFSTMVyes1x3x5x9x14x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352765231,0302,040
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292573376262-
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x6x9x-
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160km resolutionno2,304508315223201
ICON [QUBICC 160 km resolution]NRFQUBICC 160km resolutionyes1x5x7x10x11x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021 CPU / stable_29Sep2021_update2 GPU

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+082.73E+085.26E+088.72E+081.39E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x8x13x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+071.32E+082.50E+084.31E+087.01E+08
LAMMPS [EAM]NRFEAMyes1x3x5x8x14x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+051.71E+063.20E+065.49E+068.39E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x8x16x27x42x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+051.09E+062.17E+064.25E+068.29E+06
LAMMPS [SNAP]NRFSNAPyes1x9x19x37x73x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+072.27E+084.32E+086.63E+089.18E+08
LAMMPS [Tersoff]NRFTersoffyes1x8x16x24x33x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_3971e182

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno67,3665,2792,3411,202758
MILCNRFApex Mediumyes1x14x32x62x98x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1274148295584
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x4x8x15x31x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6376153305608
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x4x8x16x31x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.6493185363744
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x9x18x36x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.786132551
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x7x14x28x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.807132652
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x7x14x29x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.957142856
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x7x14x29x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V6.7 CPU; V7.0 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7182531147355
Quantum EspresssoNRFAUSURF112-jRyes1x3x7x11x14x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,9222,2201,9051,676
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40435,02618,22111,8688,110
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x11x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,06587,858175,636350,462
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,70713,33226,38652,781
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x14x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,03413,86527,48954,788
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_44e098a3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,920156794123
SPECFEM3DNRFfour_material_simple_modelyes1x14x28x54x96x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.6

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.6 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.6

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.6 | GROMACS Benchmark: STMV, CUDA Version: 11.6 | LAMMPS Benchmark: SNAP, CUDA Version: 11.6 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.6 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.6


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.12

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3491181362724
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x42x83x167x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3797193386772
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x22x44x88x177x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.244258501,7003,400
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x19x38x76x153x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.744468921,7843,569
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x20x39x78x157x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,0582,1164,2318,463
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x23x45x90x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,0572,1154,2308,459
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x44x89x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.433264129257
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x45x90x180x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,11578412213
ChromaNRFszscl21_24_128yes1x15x28x52x89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5242261145832
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x11x21x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes58302336475564
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x8x11x13x
GROMACS [Cellulose]ns/dayCelluloseyes1769107142169
GROMACS [Cellulose]NRFCelluloseyes1x5x9x13x15x
GROMACS [STMV]ns/daySTMVyes418365362
GROMACS [STMV]NRFSTMVyes1x4x9x14x16x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes352795231,0332,032
GTCNRFmoi#proc.inyes1x8x15x30x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292763447299259
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x8x9x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160km resolutionno2,304761434277224
ICON [QUBICC 160 km resolution]NRFQUBICC 160km resolutionyes1x3x5x8x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021 CPU / stable_29Sep2021_update2 GPU

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A404x A408x A40
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+082.16E+083.74E+087.04E+08
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x4x7x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+071.00E+081.77E+083.33E+08
LAMMPS [EAM]NRFEAMyes1x2x3x7x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+057.52E+051.39E+062.37E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x3x5x12x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+054.90E+059.79E+051.95E+06
LAMMPS [SNAP]NRFSNAPyes1x5x9x17x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+078.81E+071.64E+083.02E+08
LAMMPS [Tersoff]NRFTersoffyes1x3x6x11x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_3971e182

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno67,3666,0203,2091,7541,041
MILCNRFApex Mediumyes1x12x23x42x71x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1286172346692
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x9x18x36x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6391183368737
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x9x19x38x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641302625271,055
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x6x13x26x51x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.787142958
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x16x33x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.808153061
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x17x34x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.959193775
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x19x38x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,8102,1651,8181,678
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40425,42911,9938,3856,005
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x8x11x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31831,01261,868123,719247,466
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x5x11x22x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,62717,05633,78267,431
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x5x9x18x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7736,40312,72625,29250,516
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x3x7x13x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_44e098a3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9202041035329
SPECFEM3DNRFfour_material_simple_modelyes1x11x21x42x76x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.6

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | ICON Benchmark: QUBICC 160 km resolution, CUDA Version: 11.6 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.6 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.6

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.6 | GROMACS Benchmark: Cellulose, CUDA Version: 11.6 | LAMMPS Benchmark: SNAP, CUDA Version: 11.6 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.6 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.6

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.4.2


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.12

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.349318537074196191382765
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x43x85x171x22x44x88x176x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3799198397794102205409818
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x23x45x91x182x23x47x94x187x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.244078151,6293,2594218411,6833,365
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x37x73x147x19x38x76x151x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.744368721,7433,4874498991,7973,595
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x19x38x77x153x20x40x79x158x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,0972,1954,3898,7791,1282,2574,5139,026
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x23x47x93x12x24x48x96x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,1632,3274,6539,3071,2082,4154,8319,661
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x49x98x13x25x51x101x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.4332631272533365130261
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x22x44x89x177x23x46x91x182x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,115165311710142281513
ChromaNRFszscl21_24_128yes1x7x37x68x111x8x41x77x85x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5249950261587452314
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x6x13x25x43x7x15x28x46x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes58219255466-220251316230263324
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x6x11x-5x6x7x5x6x7x
GROMACS [Cellulose]ns/dayCelluloseyes176092149-5381-639296
GROMACS [Cellulose]NRFCelluloseyes1x4x8x13x-3x6x-4x8x9x
GROMACS [STMV]ns/daySTMVyes414274353122331142636
GROMACS [STMV]NRFSTMVyes1x3x7x11x14x3x6x8x3x7x9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352685121,0201,9972965561,0981,813
GTCNRFmoi#proc.inyes1x8x15x30x58x9x16x32x53x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292596378236194511345245
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x4x6x10x12x4x7x9x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160km resolutionno2,304525323201160455338199
ICON [QUBICC 160 km resolution]NRFQUBICC 160km resolutionyes1x4x7x11x14x5x7x12x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_29Sep2021 CPU / stable_29Sep2021_update2 GPU

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.07E+083.29E+086.33E+081.23E+092.29E+093.30E+086.14E+081.14E+091.87E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x6x12x22x3x6x11x18x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.09E+071.19E+082.61E+085.38E+089.68E+081.22E+082.59E+084.97E+088.11E+08
LAMMPS [EAM]NRFEAMyes1x2x5x11x19x2x5x10x16x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.41E+051.86E+063.46E+066.22E+069.60E+061.91E+063.57E+066.42E+069.95E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x9x17x31x48x9x18x32x49x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.14E+051.42E+062.84E+065.73E+061.14E+071.40E+062.80E+065.60E+061.10E+07
LAMMPS [SNAP]NRFSNAPyes1x12x25x50x99x12x25x49x97x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.78E+072.21E+084.06E+087.97E+081.43E+092.33E+084.35E+088.06E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x8x15x29x52x8x16x29x42x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_3971e182

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno67,3665,3022,5551,3357534,0792,1861,1941,111
MILCNRFApex Mediumyes1x14x29x56x98x18x34x62x67x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.12941873747496011823847697194387770
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x10x20x39x3x6x12x25x5x10x20x40x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.639819539378763127254507101201401804
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x10x20x40x3x6x13x26x5x10x20x41x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641282565141,028851693396791302585171,034
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x6x12x25x50x4x8x16x33x6x12x25x50x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.78815316151020408163263
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x8x17x34x3x6x11x23x5x9x18x36x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.80816326451021428173365
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x9x18x35x3x6x12x23x5x9x18x36x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.95918357061225499183671
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x9x18x36x3x6x13x25x5x9x19x36x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
NV-WRFgSeconds / TimestampsConus_2.5k_JAno60.620.68
NV-WRFgNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

V6.7 CPU; V7.0 GPU

ACCELERATED FEATURES

  • linear algebra (matrix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno71827913990653221399883
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x9x12x2x6x8x10x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,5692,173-1,6733,5592,190-1,718
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x-9x4x7x-9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,404-17,79712,1108,46334,19118,47212,3938,695
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-5x8x11x3x5x7x11x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,15176,014152,084304,14946,01991,776183,699367,306
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x32x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,55616,88833,10465,6989,19218,27236,23472,341
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,16114,19328,18356,2288,50616,84533,44966,707
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_44e098a3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,920158824325132683722
SPECFEM3DNRFfour_material_simple_modelyes1x14x27x50x87x17x32x60x100x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.6

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.6 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.6

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.6 | Gromacs Benchmark: ADH Dodec, CUDA Version: 11.6 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.6 | Relion Benchmark: Plasmodium Ribosome, CUDA Version: 11.4.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.6


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.12-AT_21.12

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.3463127254
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x15x29x58x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.3763126252
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x14x29x58x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.243026031,207
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x14x27x54x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.743216421,284
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x14x28x56x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.901,1072,2154,429
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x47x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,1302,2614,521
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x47x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.43214284
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x15x29x59x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1151174026
ChromaNRFszscl21_24_128yes1x10x28x44x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no52428514474
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x4x9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2022

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes58127232-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x5x-
GROMACS [Cellulose]ns/dayCelluloseyes17396272
GROMACS [Cellulose]NRFCelluloseyes1x2x4x5x
GROMACS [STMV]ns/daySTMVyes4101726
GROMACS [STMV]NRFSTMVyes1x2x4x6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35232463853
GTCNRFmoi#proc.inyes1x7x13x25x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.5_RC

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292995559424
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x4x5x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)QUBICC 160km resolutionno2,304994583365
ICON [QUBICC 160 km resolution]NRFQUBICC 160km resolutionyes1x2x4x6x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_3971e182

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno67,3667,4683,9042,154
MILCNRFApex Mediumyes1x10x19x34x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V 3.0a9 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.1255110218
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x6x11x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.6357115228
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x3x6x12x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.6474147292
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x7x14x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.784917
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x2x5x10x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.804918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x2x5x10x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.9551020
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x5x10x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.510.90
NV-WRFgNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no14,8623,3772,5671,815
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x6x8x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no92,40430,65118,10011,239
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31822,05844,05088,199
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x2x4x8x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7735,87111,51522,892
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x6x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,773-9,79919,565
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x-3x5x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_44e098a3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,92023912265
SPECFEM3DNRFfour_material_simple_modelyes1x9x18x34x