For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB |FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.2.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.2.2 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.2.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.2.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.2.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.2.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.2.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.2.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | GTC Benchmark: moi#proc.in, CUDA Version: 11.2.2 | MILC Benchmark: Apex Medium, CUDA Version: 11.2.2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.2.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM 80GB | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.2.2 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.2.2


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.9-AT_20.15

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.451422845681,1361382765521,103
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x32x64x128x255x31x62x124x248x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.501623236471,2931553096191,238
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x36x72x144x287x34x69x138x275x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.775271,0552,1104,2195321,0642,1284,257
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x23x46x93x185x23x47x93x187x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.195861,1722,3454,6905851,1702,3404,679
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x25x51x101x202x25x50x101x202x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.461,2662,5315,06310,1251,2702,5405,08010,160
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x26x52x105x13x26x53x105x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,3562,7125,42410,8481,3702,7405,48010,961
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x14x28x55x110x14x28x56x112x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.445310621342551101202404
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x37x74x148x295x35x70x140x281x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.03

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
ChromaTotal Time (Sec)szscl21_24_128no1,15835201175721128
ChromaNRFszscl21_24_128yes1x33x57x104x171x20x55x97x148x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5285127161157301712
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x13x24x42x58x11x21x38x54x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes58-346537--352501-
GROMACS [ADH Dodec]NRFADH Dodecyes1x-8x12x--8x12x-
GROMACS [Cellulose]ns/dayCelluloseyes179913924126199131174188
GROMACS [Cellulose]NRFCelluloseyes1x9x12x21x23x9x11x15x16x
GROMACS [STMV]ns/daySTMVyes423395411222385778
GROMACS [STMV]NRFSTMVyes1x5x10x13x28x5x9x14x19x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
GTCMpush/Secmoi#proc.inyes354919331,8223,6224778611,4732,643
GTCNRFmoi#proc.inyes1x14x27x53x105x14x25x43x77x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno801410771511
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x6x8x11x12x5x7x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14625151082818
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x6x10x14x17x5x8x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+085.93E+081.07E+092.12E+093.80E+095.44E+081.01E+091.88E+093.42E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x9x18x32x5x8x16x28x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+072.94E+085.17E+089.67E+081.58E+092.80E+084.97E+088.49E+081.35E+09
LAMMPS [EAM]NRFEAMyes1x5x9x17x27x5x9x15x23x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+053.23E+065.55E+068.87E+061.23E+073.12E+065.56E+068.63E+061.15E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x14x24x38x53x13x24x37x49x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.86E+063.62E+066.98E+061.29E+071.80E+063.48E+066.61E+061.21E+07
LAMMPS [SNAP]NRFSNAPyes1x15x30x58x108x15x29x55x101x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+074.87E+088.35E+081.51E+092.00E+094.52E+087.87E+081.36E+09-
LAMMPS [Tersoff]NRFTersoffyes1x11x18x33x43x10x17x29x-

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_12ddd7d9

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
MILCTotal Time (Sec)Apex Mediumno68,4742,2461,3237004062,8601,530820640
MILCNRFApex Mediumyes1x34x57x108x186x26x49x92x118x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V3.0a9

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes6.82115228454903114228456920
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x17x33x67x132x17x33x67x135x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes6.91118232451929119235475945
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x17x34x65x134x17x34x69x137x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes7.191633246471,2861613266511,291
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x23x45x90x179x22x45x91x180x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.641224479411234590
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x19x37x74x147x18x35x71x141x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.611224479512234692
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x20x39x78x155x19x38x75x150x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.65142754109132654105
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x21x42x84x168x20x40x83x161x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7651328863491479881
Quantum EspresssoNRFAUSURF112-jRyes1x6x10x13x17x6x9x10x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0133,4661,9601,5801,4063,6201,9541,6111,410
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x8x9x4x7x8x9x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90723,66912,5298,4346,08726,08313,6849,3636,656
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x8x12x17x3x6x11x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,691178,713357,385715,04875,173149,871299,686597,953
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x32x63x7x13x26x53x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77313,52426,37952,890105,49412,99125,21550,612100,010
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x4x7x14x28x3x7x13x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,94027,59054,700108,48712,08423,76947,13593,598
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x3x6x12x25x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 40GB2x A100 PCIe 40GB4x A100 PCIe 40GB8x A100 PCIe 40GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9917740221489472517
SPECFEM3DNRFfour_material_simple_modelyes1x30x56x104x161x25x49x90x135x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.2.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.2.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.2.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.2.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.2.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.2.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.2.2 | GTC Benchmark: moi#proc.in, CUDA Version: 11.2.2 | MILC Benchmark: Apex Medium, CUDA Version: 11.2.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A30 PCIe | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.2.2 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.2.2


Detailed A30 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.9-AT_20.15

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4577154308616
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x17x35x69x138x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.5084168336672
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x19x37x75x149x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.773256491,2982,597
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x14x29x57x114x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.193557091,4192,837
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x15x31x61x122x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.468881,7763,5527,103
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x9x18x37x74x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.209471,8933,7877,574
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x10x19x39x77x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.442957114228
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x20x40x79x159x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.03

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A304x A308x A30
ChromaTotal Time (Sec)szscl21_24_128no1,158331811
ChromaNRFszscl21_24_128yes1x35x65x106x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no528107542918
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x5x12x22x37x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GROMACS [ADH Dodec]ns/dayADH Dodecyes58193206351-
GROMACS [ADH Dodec]NRFADH Dodecyes1x4x5x8x-
GROMACS [Cellulose]ns/dayCelluloseyes175273110121
GROMACS [Cellulose]NRFCelluloseyes1x3x5x10x11x
GROMACS [STMV]ns/daySTMVyes412213351
GROMACS [STMV]NRFSTMVyes1x3x5x8x13x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
GTCMpush/Secmoi#proc.inyes352725109221,716
GTCNRFmoi#proc.inyes1x8x15x27x50x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802517--
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x--
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno146-282320
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x-5x6x7x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+082.83E+085.47E+081.03E+091.93E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x5x9x16x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.37E+082.59E+084.69E+088.06E+08
LAMMPS [EAM]NRFEAMyes1x2x4x8x14x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+051.78E+063.30E+065.54E+068.30E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x6x14x24x36x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.01E+062.00E+063.83E+067.28E+06
LAMMPS [SNAP]NRFSNAPyes1x8x17x32x60x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+072.37E+084.22E+087.41E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x9x16x26x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_12ddd7d9

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
MILCTotal Time (Sec)Apex Mediumno68,4745,8812,3391,238716
MILCNRFApex Mediumyes1x13x32x61x105x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V3.0a9

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes6.8273146294584
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x11x21x43x86x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes6.9176152301604
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x11x22x44x87x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes7.1992184365739
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x13x26x51x103x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.646132551
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x10x20x40x79x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.616132652
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x11x21x42x85x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.657142857
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x11x22x44x87x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A30
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno765294137105
Quantum EspresssoNRFAUSURF112-jRyes1x3x6x8x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,3502,9052,2221,814
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x2x4x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,907-19,12712,3458,287
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x-5x8x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31844,11487,889175,865350,517
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x4x8x16x31x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7737,01013,83927,49054,761
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x15x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,01513,89927,50454,837
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x15x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A302x A304x A308x A30
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991156804224
SPECFEM3DNRFfour_material_simple_modelyes1x15x28x54x96x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.2.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.2.2

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.2.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.2.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.2.2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.2.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.2.2 | GTC Benchmark: moi#proc.in, CUDA Version: 11.2.2 | MILC Benchmark: Apex Medium, CUDA Version: 11.2.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual EPYC 7742@2.25GHz with 4x NVIDIA A40 PCIe | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.2.2


Detailed A40 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.9-AT_20.15

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4590180360719
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x20x40x81x162x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.50100199399798
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x22x44x89x177x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.774028041,6093,217
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x35x71x141x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.194458911,7813,562
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x19x38x77x154x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.469941,9873,9757,949
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x21x41x82x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,0612,1214,2438,485
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x22x43x86x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.443366132264
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x23x46x92x184x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.03

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ChromaTotal Time (Sec)szscl21_24_128no1,15877402212
ChromaNRFszscl21_24_128yes1x15x29x52x93x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5282221135832
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x3x5x11x20x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GROMACS [ADH Dodec]ns/dayADH Dodecyes58311328453-
GROMACS [ADH Dodec]NRFADH Dodecyes1x7x8x10x-
GROMACS [Cellulose]ns/dayCelluloseyes1777106144167
GROMACS [Cellulose]NRFCelluloseyes1x5x9x13x15x
GROMACS [STMV]ns/daySTMVyes418355165
GROMACS [STMV]NRFSTMVyes1x4x9x13x16x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
GTCMpush/Secmoi#proc.inyes353316131,0771,990
GTCNRFmoi#proc.inyes1x10x18x31x58x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno8030-1817
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x-4x5x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14665362822
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x4x5x7x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x A404x A408x A40
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+082.13E+084.10E+087.99E+08
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x2x3x7x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.01E+081.90E+083.51E+08
LAMMPS [EAM]NRFEAMyes1x2x3x6x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+057.57E+051.38E+062.29E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x2x5x10x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+054.81E+059.56E+051.89E+06
LAMMPS [SNAP]NRFSNAPyes1x4x8x16x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+079.21E+071.78E+083.34E+08
LAMMPS [Tersoff]NRFTersoffyes1x2x4x7x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_12ddd7d9

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
MILCTotal Time (Sec)Apex Mediumno68,4745,7993,0151,584911
MILCNRFApex Mediumyes1x13x25x48x83x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V3.0a9

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes6.8287175351697
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x13x26x51x102x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes6.9193184371736
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x13x27x54x107x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes7.191322635341,048
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x18x37x74x146x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.648163162
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x12x24x48x96x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.618163263
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x13x26x52x103x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.6510204079
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x15x31x61x122x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,1312,9082,1241,825
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x4x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90727,23114,2799,7366,979
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x6x10x15x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A40
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31836,696
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77310,216
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,369
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A402x A404x A408x A40
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991174894726
SPECFEM3DNRFfour_material_simple_modelyes1x13x26x49x87x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.2.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.2.2 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.2.2

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.2.2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.2.2 | GROMACS Benchmark: Cellulose, CUDA Version: 11.2.2 | LAMMPS Benchmark: SNAP, CUDA Version: 11.2.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.2.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.2.2 | GTC Benchmark: moi#proc.in, CUDA Version: 11.2.2 | MILC Benchmark: Apex Medium, CUDA Version: 11.2.2

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.2.2 | ICON Benchmark: SLAM 191 levels 160 km resolution with radiation, CUDA Version: 11.2.2


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.9-AT_20.15

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.459418937775496192383766
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x21x42x85x170x22x43x86x172x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.50105211421842107215429858
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x23x47x94x187x24x48x95x191x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.774188371,6733,3464238471,6933,387
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x18x37x73x147x19x37x74x149x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.194619211,8433,6864689371,8743,747
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x20x40x79x159x20x40x81x162x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.461,1462,2914,5839,1661,1922,3844,7689,536
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x12x24x48x95x12x25x49x99x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,2272,4554,9099,8181,2822,5645,12710,254
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x12x25x50x100x13x26x52x104x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.4434671352703468136272
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x23x47x94x187x24x47x95x189x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.03

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ChromaTotal Time (Sec)szscl21_24_128no1,1581633117101552715
ChromaNRFszscl21_24_128yes1x7x38x70x115x8x42x78x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5289649251585442314
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x6x13x26x44x7x15x28x47x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100 SXM2 32GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes58228260444237253310236271327-
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x6x10x5x6x7x5x6x8x-
GROMACS [Cellulose]ns/dayCelluloseyes176293147578087648798-
GROMACS [Cellulose]NRFCelluloseyes1x4x8x13x3x5x6x4x6x9x-
GROMACS [STMV]ns/daySTMVyes415284412233215273652
GROMACS [STMV]NRFSTMVyes1x3x7x11x3x6x8x3x6x9x13x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
GTCMpush/Secmoi#proc.inyes352805351,0652,0812975601,1091,825
GTCNRFmoi#proc.inyes1x8x16x31x61x9x16x32x53x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno802415108211410
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x3x5x8x10x4x6x8x
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno14646261511392315
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x3x6x10x13x4x6x10x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_10Feb2021

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.23E+083.46E+086.52E+081.30E+092.35E+093.45E+086.37E+081.19E+092.02E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x3x5x11x20x3x5x10x17x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.82E+071.21E+082.71E+085.53E+089.48E+081.24E+082.69E+085.21E+088.17E+08
LAMMPS [EAM]NRFEAMyes1x2x5x9x16x2x5x9x14x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.94E+051.97E+063.66E+066.33E+069.99E+062.03E+063.78E+066.49E+069.45E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x8x16x27x43x9x16x28x41x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.20E+051.35E+062.67E+065.22E+061.01E+071.35E+062.66E+065.21E+069.96E+06
LAMMPS [SNAP]NRFSNAPyes1x11x22x43x84x11x22x43x83x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes4.59E+072.37E+084.17E+088.31E+081.39E+092.48E+084.50E+087.77E+081.19E+09
LAMMPS [Tersoff]NRFTersoffyes1x5x9x18x30x5x10x17x26x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_12ddd7d9

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB
MILCTotal Time (Sec)Apex Mediumno68,4745,3012,5251,3287564,1132,1911,198
MILCNRFApex Mediumyes1x14x30x57x100x18x34x63x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V3.0a9

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x RTX60002x RTX60004x RTX60008x RTX60001x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes6.82941843677415911923647396192383765
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x14x27x54x109x9x18x35x69x14x28x56x112x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes6.919819338676963125250498101197396788
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x14x28x56x111x9x18x36x72x15x29x57x114x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes7.191272545101,016841683356691312595141,032
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x18x35x71x141x12x23x47x93x18x36x71x144x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.64816316351021418163264
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x12x24x49x98x8x16x32x64x13x25x50x100x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.61816326451121438163365
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x13x26x53x105x9x17x35x70x14x27x53x107x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.65918367261225509183672
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x14x27x55x110x10x19x38x78x14x28x56x111x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)4x V100 SXM2 32GB4x V100S PCIe 32GB
NV-WRFgSeconds / TimestampsConus_2.5k_JAno60.620.68
NV-WRFgNRFConus_2.5k_JAyes1x10x9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.7

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno7653021591047930817211899
Quantum EspresssoNRFAUSURF112-jRyes1x3x5x8x11x3x5x7x9x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0135,0212,8272,0951,9035,0072,8182,1251,853
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x3x5x6x7x3x5x6x7x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90735,30518,32412,2018,65436,35218,74612,3068,634
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x5x8x12x2x5x8x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31838,10876,034152,322304,16346,08291,987183,880367,934
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x7x13x27x4x8x16x33x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7738,52216,83933,16865,8219,24618,24736,28372,268
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x9x17x2x5x10x19x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7737,18914,28528,40456,7138,52416,93433,67467,186
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x8x15x2x4x9x18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x V100 SXM2 32GB2x V100 SXM2 32GB4x V100 SXM2 32GB8x V100 SXM2 32GB1x V100S PCIe 32GB2x V100S PCIe 32GB4x V100S PCIe 32GB8x V100S PCIe 32GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,991159824425133693722
SPECFEM3DNRFfour_material_simple_modelyes1x14x28x52x91x17x33x61x104x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.2.2

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.2.2 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.2.2

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 11.2.2 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.2.2 | Gromacs Benchmark: ADH Dodec, CUDA Version: 11.2.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.2.2

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.2.2 | GTC Benchmark: moi#proc.in, CUDA Version: 11.2.2 | MILC Benchmark: Apex Medium, CUDA Version: 11.2.2


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.9-AT_20.15

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.4563126252
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x14x28x57x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.5065131261
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x15x29x58x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.773026031,207
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x13x26x53x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes23.193126241,249
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x13x27x54x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes96.461,0102,0204,040
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x10x21x42x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes98.201,0342,0674,134
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x11x21x42x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.44234590
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x16x31x63x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2021.03

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1581233923
ChromaNRFszscl21_24_128yes1x9x29x49x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.7

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Fun3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no52827814172
Fun3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x4x9x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2021

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes58132236-
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x5x-
GROMACS [Cellulose]ns/dayCelluloseyes17426371
GROMACS [Cellulose]NRFCelluloseyes1x2x4x5x
GROMACS [STMV]ns/daySTMVyes4111726
GROMACS [STMV]NRFSTMVyes1x2x4x6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5 Updated

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35267525971
GTCNRFmoi#proc.inyes1x8x15x28x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.2+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno803621-
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x2x4x-
ICON [SLAM 191 - 160KM - with radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno146794331
ICON [SLAM 191 - 160KM - with radiation]NRFSLAM 191 levels 160 km resolution with radiationyes1x2x3x5x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_12ddd7d9

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno68,4746,7623,4702,180
MILCNRFApex Mediumyes1x11x22x35x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V3.0a9

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes6.8255111219
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x8x16x32x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes6.9158115227
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x8x17x33x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes7.1975149295
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x10x21x41x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.644917
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x7x14x27x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.614918
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x7x15x29x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.6551021
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x8x16x32x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.510.90
NV-WRFgNRFConus_2.5k_JAyes1x7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.1.2

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no13,0136,0613,5892,479
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x2x4x5x
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no80,90735,31120,50012,452
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x3x4x8x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_10

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31829,43858,891117,945
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x5x10x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,7736,49112,61025,093
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x3x7x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,7735,92611,74623,449
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x3x6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_b50cc7b9

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,99120710556
SPECFEM3DNRFfour_material_simple_modelyes1x11x22x41x