For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB |FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.3.1

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.3.1 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.3.1

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.3.1 | GROMACS Benchmark: Cellulose, CUDA Version: 11.3.1 | LAMMPS Benchmark: SNAP, CUDA Version: 11.3.1 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.3.1 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.3.1

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.3.1 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.3.1 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.11-AT_21.03

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.471523046081,2151492975941,188
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x34x68x136x272x33x66x133x266x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.471653296581,3171633266531,306
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x37x74x147x295x37x73x146x292x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.915441,0892,1774,3545401,0812,1624,323
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x24x48x95x190x24x47x94x189x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.225851,1702,3414,6825851,1712,3424,683
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x25x50x101x202x25x50x101x202x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.581,2312,4614,9239,8461,2292,4584,9179,834
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x25x51x102x13x25x51x102x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,2962,5925,18410,3691,3002,6005,20010,400
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x26x53x106x13x26x53x106x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.465611222444854108216432
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x38x77x153x307x37x74x148x296x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5245227161154281712
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x13x24x41x58x12x23x39x53x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes56339-552-335-508-
GROMACS [ADH Dodec]NRFADH Dodecyes1x8x-13x-8x-12x-
GROMACS [Cellulose]ns/dayCelluloseyes1610014024226497130168184
GROMACS [Cellulose]NRFCelluloseyes1x9x13x23x25x9x12x16x17x
GROMACS [STMV]ns/daySTMVyes423395811323385574
GROMACS [STMV]NRFSTMVyes1x6x11x16x32x6x11x15x21x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354909291,8163,6074828761,4962,688
GTCNRFmoi#proc.inyes1x14x27x53x105x14x26x44x78x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno80139761411
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x6x9x11x13x6x7x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14625151082617
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x6x10x15x18x6x9x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+085.79E+081.06E+092.11E+093.74E+095.47E+081.01E+091.83E+092.47E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x9x18x31x5x8x15x21x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+072.86E+085.12E+089.69E+081.59E+092.75E+085.07E+088.46E+081.33E+09
LAMMPS [EAM]NRFEAMyes1x5x9x17x27x5x9x15x23x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+053.13E+065.55E+068.85E+061.25E+073.13E+065.46E+068.44E+061.13E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x13x24x38x54x13x24x36x49x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.89E+063.69E+067.11E+061.31E+071.89E+063.70E+067.12E+061.32E+07
LAMMPS [SNAP]NRFSNAPyes1x16x31x59x109x16x31x59x110x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+074.86E+088.48E+081.55E+092.17E+094.76E+088.04E+081.26E+091.91E+09
LAMMPS [Tersoff]NRFTersoffyes1x11x18x34x47x10x17x27x41x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_c30ed15e

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno68,4742,2621,3237003922,5301,357747703
MILCNRFApex Mediumyes1x33x57x108x192x30x55x101x107x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V3.0a9; Intel CPU V2.15 alpha AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes17.28121238477959121240479963
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x7x14x28x55x7x14x28x56x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes16.78119233463917117234468933
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x7x14x28x55x7x14x28x56x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes18.041653246511,2961613256461,291
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x9x18x36x72x9x18x36x72x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.771224479512234793
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x13x26x53x7x13x26x53x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.861224499612244795
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x6x13x26x52x6x13x25x51x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.83142754107142754108
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x7x15x30x59x7x15x29x59x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7651389064531571048779
Quantum EspresssoNRFAUSURF112-jRyes1x6x9x13x16x5x8x10x11x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0133,4671,9231,5291,3503,8852,2851,919-
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x9x10x3x6x7x-
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90723,74512,3788,2926,05726,76214,0889,8247,418
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x8x12x17x3x6x10x14x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,676178,367357,291713,31289,717178,944357,872715,471
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x32x63x8x16x32x63x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,46524,76049,33898,30512,58625,07250,090100,037
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x13x26x3x7x13x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,89927,48354,644108,05213,70927,24754,164107,883
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x4x7x14x29x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 800GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9917740221477412317
SPECFEM3DNRFfour_material_simple_modelyes1x30x56x105x158x29x56x100x137x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.3.1

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.3.1

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.3.1 | GROMACS Benchmark: Cellulose, CUDA Version: 11.3.1 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.3.1 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.3.1

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.3.1

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.3.1 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.11-AT_21.03

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4781163326651
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x18x36x73x146x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.4786172344687
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x19x38x77x154x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.913396781,3572,713
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x15x30x59x118x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.223587161,4332,865
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x31x62x123x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.589081,8163,6337,266
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x9x19x38x75x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.209581,9153,8317,662
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x20x39x78x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.463059119237
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x20x41x81x162x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,115351811
ChromaNRFszscl21_24_128yes1x33x62x103x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no524111563018
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x22x36x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes56193-354-
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x-8x-
GROMACS [Cellulose]ns/dayCelluloseyes165274111126
GROMACS [Cellulose]NRFCelluloseyes1x3x5x10x12x
GROMACS [STMV]ns/daySTMVyes412213351
GROMACS [STMV]NRFSTMVyes1x3x5x9x14x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352725119261,731
GTCNRFmoi#proc.inyes1x8x15x27x50x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802416--
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x--
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14647282319
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x5x6x8x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+082.80E+085.37E+081.02E+091.51E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x4x8x13x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.34E+082.55E+084.64E+087.96E+08
LAMMPS [EAM]NRFEAMyes1x2x4x8x14x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+051.75E+063.24E+065.48E+068.13E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x6x14x24x35x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.02E+062.01E+063.92E+067.36E+06
LAMMPS [SNAP]NRFSNAPyes1x8x17x33x61x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+072.35E+084.19E+087.38E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x9x16x26x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_c30ed15e

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno68,4745,8442,3471,255713
MILCNRFApex Mediumyes1x13x32x60x106x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V3.0a9; Intel CPU V2.15 alpha AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes17.2873146292586
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x4x8x17x34x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes16.7876152302607
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x9x18x36x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes18.0493185369738
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x5x10x20x41x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.776132550
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x7x14x28x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.866132652
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x3x7x14x28x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.837142856
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x4x8x15x31x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno765294136106
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x8x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,3432,8792,1321,823
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x2x5x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90737,59019,47312,5698,604
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x2x4x8x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,12187,870175,826351,123
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,50512,96925,79451,267
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x7x14x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,01313,91227,50554,801
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991156804224
SPECFEM3DNRFfour_material_simple_modelyes1x15x28x54x96x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.3.1

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.3.1

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.3.1 | GROMACS Benchmark: Cellulose, CUDA Version: 11.3.1 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.3.1 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.3.1

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.3.1

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.11-AT_21.03

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4795190381762
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x43x85x170x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.47102203406813
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x23x45x91x182x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.914238471,6933,386
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x37x74x148x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.224509011,8013,602
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x19x39x78x155x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.581,0272,0534,1068,212
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x11x21x43x85x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,0782,1554,3108,621
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x44x88x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.463469137274
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x23x47x94x188x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,11578412213
ChromaNRFszscl21_24_128yes1x15x28x52x89x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5242261155932
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x11x20x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes56306323498-
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x8x12x-
GROMACS [Cellulose]ns/dayCelluloseyes1674108146164
GROMACS [Cellulose]NRFCelluloseyes1x5x10x14x15x
GROMACS [STMV]ns/daySTMVyes418345063
GROMACS [STMV]NRFSTMVyes1x5x9x14x18x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes353346191,0912,017
GTCNRFmoi#proc.inyes1x10x18x32x59x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802918-16
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x4x-5x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14666362721
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x4x5x7x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A404x A408x A40
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+082.12E+084.05E+087.16E+08
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x3x6x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.01E+081.90E+083.50E+08
LAMMPS [EAM]NRFEAMyes1x2x3x6x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+057.59E+051.38E+062.33E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x2x5x10x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+054.85E+059.65E+051.91E+06
LAMMPS [SNAP]NRFSNAPyes1x4x8x16x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+079.18E+071.78E+083.32E+08
LAMMPS [Tersoff]NRFTersoffyes1x2x4x7x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_c30ed15e

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno68,4745,8063,0151,587924
MILCNRFApex Mediumyes1x13x25x47x81x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V3.0a9; Intel CPU V2.15 alpha AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes17.2887173351696
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x10x20x40x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes16.7892182368735
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x5x11x22x44x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes18.041322615301,054
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x29x58x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.778153161
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x9x17x35x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.868163263
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x8x17x34x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.8310203978
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x11x21x43x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,1492,8102,1041,815
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x5x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90727,25714,1749,9316,992
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x6x10x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31836,696
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77310,021
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,369
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991174894726
SPECFEM3DNRFfour_material_simple_modelyes1x13x25x49x87x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.3.1

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.3.1 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.3.1

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.3.1 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.3.1 | GROMACS Benchmark: Cellulose, CUDA Version: 11.3.1 | LAMMPS Benchmark: SNAP, CUDA Version: 11.3.1 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.3.1

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.3.1

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.3.1 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.3.1


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.11-AT_21.03

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4799197394788101202403806
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x22x44x88x176x23x45x90x180x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.47106212423847110219438877
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x24x47x95x189x25x49x98x196x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.914308601,7203,4404388761,7523,503
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x19x38x75x150x19x38x76x153x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.224589161,8313,6634739461,8933,785
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x20x39x79x158x20x41x82x163x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.581,1412,2814,5639,1261,1982,3964,7919,583
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x47x94x12x25x50x99x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,2212,4424,8859,7691,2742,5495,09810,195
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x25x50x99x13x26x52x104x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.4635691382763570141282
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x24x47x95x189x24x48x96x193x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,1151653117101422815
ChromaNRFszscl21_24_128yes1x7x37x68x111x8x41x77x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5249950261588452414
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x6x13x25x43x7x15x28x46x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes56227260437-233-313234-330
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x6x10x-6x-7x6x-8x
GROMACS [Cellulose]ns/dayCelluloseyes166292146-567988648698
GROMACS [Cellulose]NRFCelluloseyes1x5x9x14x-4x6x8x5x8x9x
GROMACS [STMV]ns/daySTMVyes415284551122332152736
GROMACS [STMV]NRFSTMVyes1x4x7x12x14x3x6x9x4x7x10x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352895501,0992,1453075801,1441,872
GTCNRFmoi#proc.inyes1x8x16x32x62x9x17x33x55x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno80241598211410
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x8x10x4x6x8x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14646261511392315
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x6x10x13x4x6x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+083.43E+086.55E+081.29E+092.33E+093.41E+086.36E+081.19E+092.01E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x11x19x3x5x10x17x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.20E+082.72E+085.53E+089.41E+081.22E+082.69E+085.17E+088.23E+08
LAMMPS [EAM]NRFEAMyes1x2x5x9x16x2x5x9x14x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+051.95E+063.63E+066.32E+069.96E+062.00E+063.75E+066.43E+069.36E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x8x16x27x43x9x16x28x40x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.37E+062.71E+065.32E+061.02E+071.36E+062.68E+065.21E+061.00E+07
LAMMPS [SNAP]NRFSNAPyes1x11x23x44x85x11x22x43x83x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+072.38E+084.18E+088.40E+081.39E+092.50E+084.54E+087.98E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x9x18x30x5x10x17x26x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_c30ed15e

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno68,4745,3182,5351,3267474,1402,2211,209
MILCNRFApex Mediumyes1x14x30x57x101x18x34x62x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V3.0a9; Intel CPU V2.15 alpha AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes17.28951853697386112124148497196384773
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x5x11x21x43x4x7x14x28x6x11x22x45x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes16.789919238877565128255511102200398799
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x6x11x23x46x4x8x15x30x6x12x24x48x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes18.041302575101,025881723456901322585211,037
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x7x14x28x57x5x10x19x38x7x14x29x57x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.77816316251121428163264
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x4x9x18x35x3x6x12x24x5x9x18x36x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.86816326561122448163366
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x4x9x17x35x3x6x12x24x4x9x18x35x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.83918367171326529183672
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x5x10x19x39x4x7x14x29x5x10x20x39x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
NV-WRFgSeconds / TimestampsConus_2.5k_JAno60.620.68
NV-WRFgNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7652911541017730116811699
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x8x11x3x5x7x9x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,0472,8312,0901,8054,9942,8232,1091,873
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x5x6x7x3x5x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,907-18,70112,4028,61037,03819,19612,3938,746
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-5x8x12x2x5x8x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,17176,114152,191304,31146,13191,925183,975367,158
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x32x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,50816,84633,16665,8709,13918,18336,13172,316
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,18614,27828,39456,6888,51816,92333,67867,161
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x8x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991159824325133693722
SPECFEM3DNRFfour_material_simple_modelyes1x14x28x53x91x17x33x62x104x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.3.1

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.3.1 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.3.1

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.3.1 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.3.1 | Gromacs Benchmark: ADH Dodec, CUDA Version: 11.3.1 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.3.1

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.3.1 | GTC Benchmark: moi#proc.in, CUDA Version: 11.3.1 | MILC Benchmark: Apex Medium, CUDA Version: 11.3.1


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.11-AT_21.03

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4767134267
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x15x30x60x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.4768137274
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x15x31x61x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.913236471,293
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x14x28x56x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.223326651,330
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x14x29x57x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.581,1552,3094,618
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x48x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,1802,3614,722
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x24x48x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.46234794
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x16x32x64x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.05

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1151174026
ChromaNRFszscl21_24_128yes1x10x28x44x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7 (update 1)

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no52428514475
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x4x9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021.2

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes56131236-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x6x-
GROMACS [Cellulose]ns/dayCelluloseyes164163-
GROMACS [Cellulose]NRFCelluloseyes1x2x5x-
GROMACS [STMV]ns/daySTMVyes4111726
GROMACS [STMV]NRFSTMVyes1x3x4x7x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35270531985
GTCNRFmoi#proc.inyes1x8x15x29x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno803621-
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x4x-
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno146814331
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x3x5x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_c30ed15e

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno68,4746,7153,4702,603
MILCNRFApex Mediumyes1x11x22x29x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

GPU, AMD CPU V3.0a9; Intel CPU V2.15 alpha AVX512

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes17.2857113224
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x3x7x13x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes16.7859118234
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x4x7x14x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes18.0476152302
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x4x8x17x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.775918
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x3x5x10x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.865918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x3x5x10x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.8351121
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x3x6x12x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.510.90
NV-WRFgNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0136,0883,6132,538
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x2x4x5x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90735,18820,36512,538
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x4x8x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2021_05

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31829,38158,743117,564
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x5x10x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,41712,75525,212
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x7x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7735,90211,75423,424
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x3x6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,99120710556
SPECFEM3DNRFfour_material_simple_modelyes1x11x22x41x