For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA A100, V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.0

Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.0 | GROMACS Benchmark: Cellulose, CUDA Version: 11.0 | LAMMPS Benchmark: Tersoff, CUDA Version: 10.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: AMD EPYC 7742@2.25GHz with 4x NVIDIA A100 SXM4 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.0


Detailed A100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.00 Preview

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.28 147 294 587 1,175
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 34x 69x 137x 274x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.31 159 318 636 1,273
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 37x 74x 148x 295x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 21.81 525 1,050 2,100 4,200
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 24x 48x 96x 193x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.01 563 1,125 2,251 4,502
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 26x 51x 102x 205x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.04 1,192 2,384 4,768 9,537
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 13x 26x 51x 103x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 93.92 1,272 2,544 5,088 10,177
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 14x 27x 54x 108x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.47 54 107 215 430
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 37x 73x 146x 292x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
Chroma Total Time (Sec) szscl21_24_128 no 1,130 60 22 12 8
Chroma NRF szscl21_24_128 yes 1x 19x 51x 91x 151x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 57 30 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 11x 20x 33x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 298 321 444 -
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 7x 8x 10x -
GROMACS [Cellulose] ns/day Cellulose yes 17 89 125 196 246
GROMACS [Cellulose] NRF Cellulose yes 1x 8x 11x 18x 22x
GROMACS [STMV] ns/day STMV yes 4 19 35 53 99
GROMACS [STMV] NRF STMV yes 1x 4x 9x 13x 25x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
GTC Mpush/Sec moi#proc.in yes 35 344 650 1,273 2,440
GTC NRF moi#proc.in yes 1x 10x 19x 37x 71x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 16 12 10
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 8x 11x 14x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 31 20 14
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 10x 15x 21x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_20Nov2019

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 5.52E+08 1.01E+09 1.83E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 5x 9x 17x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 6.15E+07 2.02E+08 3.47E+08 6.47E+08
LAMMPS [EAM] NRF EAM yes 1x 3x 7x 13x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.55E+05 2.39E+06 3.97E+06 5.42E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 6x 12x 16x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 5.83E+07 4.28E+08 7.70E+08 1.32E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 7x 12x 20x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
MILC Total Time (Sec) Apex Medium no 67,299 3,431 2,000 1,053 585
MILC NRF Apex Medium yes 1x 22x 37x 70x 126x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a1

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.34 113.23 203.50 447.46 885.68
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 15x 28x 61x 121x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.18 116.07 230.72 460.04 914.82
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 16x 32x 64x 127x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.45 142.83 283.43 563.69 1114.82
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 19x 38x 76x 150x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 11.03 21.68 43.86 86.27
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 17x 33x 67x 133x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.65 11.16 21.84 44.36 87.83
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 17x 34x 68x 135x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.68 12.66 24.77 49.47 97.94
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 19x 36x 73x 144x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.5 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 145 86 63
Quantum Espressso NRF AUSURF112-jR yes 1x 6x 9x 13x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 68,229 136,243 271,820 544,960
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 6x 12x 24x 48x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 13,033 25,849 51,519 102,840
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 3x 7x 14x 27x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 12,230 24,224 48,105 95,526
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 3x 6x 13x 25x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4 8x A100 SXM4
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 89 47 25 16
SPECFEM3D NRF four_material_simple_model yes 1x 24x 46x 84x 131x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 1x A100 SXM4 2x A100 SXM4 4x A100 SXM4
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 784 448 301
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 11x 17x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 653 383 225
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 6x 10x 16x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 949 599 414
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x - 7x 10x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2 | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPENFEM3D Benchmark: four_material_simple_model, CUDA Version: 11.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold@2.60GHz with 4x NVIDIA V100 SXM2 | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 10.2 | AMBER Benchmark: DC-Cellulose_NVE, CUDA Version: 11.0 | GROMACS Benchmark: Cellulose, CUDA Version: 11.0 | LAMMPS Benchmark: ReaxFF/C, CUDA Version: 10.2 | NAMD Benchmark: apoa1_nve_cuda, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2 | Quantum Espresso Benchmark: AUSURF112-jR, CUDA Version: 11.0


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.00 Preview

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.28 102 204 408 817 101 203 406 811
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 24x 48x 95x 191x 24x 47x 95x 190x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.31 108 217 433 867 109 218 435 870
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 25x 50x 101x 201x 25x 50x 101x 202x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 21.81 431 861 1,722 3,444 441 882 1,764 3,528
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 20x 39x 79x 158x 20x 40x 81x 162x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.01 463 926 1,852 3,704 471 942 1,885 3,769
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 21x 42x 84x 168x 21x 43x 86x 171x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.04 1,127 2,254 4,509 9,017 1,181 2,363 4,725 9,451
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 24x 48x 97x 13x 25x 51x 102x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 93.92 1,202 2,403 4,806 9,612 1,257 2,515 5,029 10,059
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 13x 26x 51x 102x 13x 27x 54x 107x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.47 36 71 142 284 35 70 140 280
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 24x 48x 97x 193x 24x 48x 95x 191x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
Chroma Total Time (Sec) szscl21_24_128 no 1,130 139 31 16 11 131 27 15
Chroma NRF szscl21_24_128 yes 1x 8x 37x 71x 101x 9x 42x 78x

CryoSPARC

Microscopy

CryoSPARC is a state-of-the-art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines

VERSION

V2.11.0

ACCELERATED FEATURES

  • all GPU app. New cryoSPARC live product allows streaming HPC reconstruction as the microscope acquires data.

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://cryosparc.com/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB
CryoSPARC [class_2D_200] Total Time (Sec) class_2D_200 no 13,297 2,349 1,896 2,189 1,714
CryoSPARC [class_2D_200] NRF class_2D_200 yes 1x 6x 7x 6x 8x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 97 49 26 19 86 44 24 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 6x 13x 24x 33x 7x 14x 26x 33x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 197 248 446 207 242 276 205 260 301
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 5x 6x 10x 5x 6x 7x 5x 6x 7x
GROMACS [Cellulose] ns/day Cellulose yes 17 58 91 145 54 78 79 60 86 90
GROMACS [Cellulose] NRF Cellulose yes 1x 4x 8x 13x 3x 5x 6x 4x 6x 8x
GROMACS [STMV] ns/day STMV yes 4 12 25 41 12 25 30 12 25 34
GROMACS [STMV] NRF STMV yes 1x 3x 6x 10x 3x 6x 7x 3x 6x 9x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
GTC Mpush/Sec moi#proc.in yes 35 228 422 828 1,594 230 435 852 1,645
GTC NRF moi#proc.in yes 1x 7x 12x 24x 46x 7x 13x 25x 48x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 25 16 11 10 22 15 12
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 5x 8x 12x 14x 6x 9x 11x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 49 28 17 17 42 25 17
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 6x 11x 17x 18x 7x 12x 17x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_20Nov2019

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 2.92E+08 5.35E+08 1.16E+09 2.12E+09 2.92E+08 5.17E+08 1.08E+09 1.85E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 2x 5x 11x 20x 2x 5x 10x 17x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 6.15E+07 1.06E+08 2.22E+08 4.30E+08 7.63E+08 1.08E+08 2.19E+08 4.01E+08 6.94E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 4x 8x 15x 2x 4x 8x 14x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.55E+05 1.56E+06 2.72E+06 4.55E+06 7.00E+06 1.64E+06 2.85E+06 4.65E+06 7.25E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 4x 8x 13x 21x 4x 8x 14x 21x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 5.83E+07 2.12E+08 3.97E+08 7.74E+08 1.23E+09 2.24E+08 4.20E+08 7.23E+08 1.07E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 4x 6x 12x 18x 4x 6x 11x 16x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB
MILC Total Time (Sec) Apex Medium no 67,299 5,533 3,073 1,582 890 5,487 3,044 1,661
MILC NRF Apex Medium yes 1x 13x 24x 47x 83x 13x 24x 44x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a1

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 8x RTX6000 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.34 88.93 176.36 350.01 691.33 57.19 113.93 227.53 454.30 90.37 180.08 358.62 711.44
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 12x 24x 48x 94x 8x 16x 31x 62x 12x 25x 49x 97x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.18 91.30 180.99 358.35 706.56 58.69 117.10 233.96 465.77 91.78 182.39 363.21 722.71
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 13x 25x 50x 98x 8x 16x 33x 65x 13x 25x 51x 101x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.45 114.10 226.15 448.11 887.56 76.61 153.23 305.46 611.16 114.38 227.15 451.95 898.83
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 15x 30x 60x 119x 10x 21x 41x 82x 15x 30x 61x 121x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 8.25 16.47 32.38 63.92 5.11 10.21 20.39 40.20 8.14 16.22 32.32 63.43
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 13x 25x 50x 98x 8x 16x 31x 62x 13x 25x 50x 98x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.65 8.28 16.60 32.82 63.97 5.18 10.37 20.65 40.73 8.13 16.23 32.31 63.36
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 13x 26x 50x 98x 8x 16x 32x 63x 13x 25x 50x 97x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.68 9.40 18.83 37.18 72.16 6.20 12.39 24.73 48.63 9.08 18.14 36.01 70.55
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 14x 28x 55x 106x 9x 18x 36x 72x 13x 27x 53x 104x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x V100 32GB SXM2 4x V100S PCIe 32GB
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 0.62 0.68
NV-WRFg NRF Conus_2.5k_JA yes 1x 10x 9x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.5 (GPU) / 6.4.1 (CPU)

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 355 200 120 85 344 197 129 105
Quantum Espressso NRF AUSURF112-jR yes 1x 2x 4x 7x 9x 2x 4x 6x 8x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.0.8

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 2.82E+04 1.58E+04 9.88E+03 6.86E+03 2.86E+04 1.68E+04 1.06E+04 7.00E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 7x 10x 17x 24x 7x 10x 16x 23x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 41,587 82,898 165,782 331,629 42,877 85,665 171,015 342,458
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 4x 7x 15x 29x 4x 8x 15x 30x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 8,317 16,489 32,912 65,584 8,115 16,095 32,027 63,951
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 9x 17x 2x 4x 8x 17x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,736 15,369 30,567 60,944 8,496 16,893 33,597 67,051
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 8x 16x 2x 4x 9x 18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x V100S PCIe 32GB 2x V100S PCIe 32GB 4x V100S PCIe 32GB 8x V100S PCIe 32GB
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 148 76 42 24 133 69 37 22
SPECFEM3D NRF four_material_simple_model yes 1x 14x 28x 51x 90x 16x 31x 57x 97x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 6.1

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] Elapsed Time (Sec) GaAsBi-512 no 3,098 1,175 653 368
VASP [GaAsBi-512 (DFT-FAST-REAL-STD)] NRF GaAsBi-512 yes 1x - 7x 14x
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] Elapsed Time (Sec) Si256_VJT_HSE06 no 3,315 1,100 600 307
VASP [Si256_VJT_HSE06 (HYB-DIROPT-REAL-GAM)] NRF Si256_VJT_HSE06 yes 1x 3x 6x 12x
VASP [Si-Huge (DFT-DAV-REAL-STD)] Elapsed Time (Sec) Si-Huge (DFT-DAV-REAL-STD) no 3,651 1,376 784 518
VASP [Si-Huge (DFT-DAV-REAL-STD)] NRF Si-Huge (DFT-DAV-REAL-STD) yes 1x 3x 5x 8x

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | FUN3D Benchmark: dpw_wbt0_crs-3.6Mn_5, CUDA Version: 11.0

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe | RTM Benchmark: Isotropic Radius 4, CUDA Version: 11.0 | SPECFEM3D Benchmark: four_material_simple_model, CUDA Version 11.0

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe | Relion Benchmark: Plasmodium Ribosome (2D), CUDA Version: 10.2 | AMBER Benchmark: DC-STMV_NPT, CUDA Version: 11.0

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe Chroma Benchmark: szscl21_24_128, CUDA Version: 11.0 | GTC Benchmark: moi#proc.in, CUDA Version: 11.0 | MILC Benchmark: Apex Medium, CUDA Version: 11.0


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

20.00 Preview

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.28 66 131 263
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 15x 31x 61x
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.31 67 135 270
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 16x 31x 63x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 21.81 319 637 1,275
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 15x 29x 58x
AMBER [PME-FactorIX_NVE_4fs] ns/day DC-FactorIX_NVE yes 22.01 329 657 1,314
AMBER [PME-FactorIX_NVE_4fs] NRF DC-FactorIX_NVE yes 1x 15x 30x 60x
AMBER [PME-JAC_NPT_4fs] ns/day DC-JAC_NPT yes 93.04 1,159 2,319 4,638
AMBER [PME-JAC_NPT_4fs] NRF DC-JAC_NPT yes 1x 12x 25x 50x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 93.92 1,166 2,332 4,664
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 12x 25x 50x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.47 24 48 95
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 16x 32x 65x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V2020.0

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,130 116 37 25
Chroma NRF szscl21_24_128 yes 1x 10x 30x 46x

CryoSPARC

Microscopy

CryoSPARC is a state-of-the-art scientific software platform for cryo-electron microscopy (cryo-EM) used in research and drug discovery pipelines

VERSION

V2.11.0

ACCELERATED FEATURES

  • all GPU app. New cryoSPARC live product allows streaming HPC reconstruction as the microscope acquires data.

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://cryosparc.com/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x T4 PCIe
CryoSPARC [class_2D_200] Total Time (Sec) class_2D_200 no 13,297 2,253
CryoSPARC [class_2D_200] NRF class_2D_200 yes 1x 6x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Fun3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 280 141 74
Fun3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 2x 4x 8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 58 126 212
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 5x
GROMACS [Cellulose] ns/day Cellulose yes 17 41 57
GROMACS [Cellulose] NRF Cellulose yes 1x 2x 4x
GROMACS [STMV] ns/day STMV yes 4 10 17
GROMACS [STMV] NRF STMV yes 1x 2x 4x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.5

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 35 231 458 909
GTC NRF moi#proc.in yes 1x 7x 13x 26x

ICON

Weather and Climate

A global unified atmosphere model for numerical weather prediction and climate modeling research.

VERSION

2.6.0+RRTMGP

ACCELERATED FEATURES

  • Full model of dynamics and physics

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://code.mpimet.mpg.de/projects/iconpublic

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
ICON [SLAM 191 - 160KM - no radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution without radiation no 130 38 23 19
ICON [SLAM 191 - 160KM - no radiation] NRF SLAM 191 levels 160 km resolution without radiation yes 1x 3x 6x 7x
ICON [SLAM 191 - 160KM - with radiation] Integrate_nh (sec) SLAM 191 levels 160 km resolution with radiation no 300 84 46 32
ICON [SLAM 191 - 160KM - with radiation] NRF SLAM 191 levels 160 km resolution with radiation yes 1x 4x 7x 9x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

develop_f8533104

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Total Time (Sec) Apex Medium no 67,299 7,169 3,735 2,551
MILC NRF Apex Medium yes 1x 10x 20x 29x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 3.0a1

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.34 52.52 105.03 211.48
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 7x 14x 29x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.18 52.74 105.74 212.68
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 7x 15x 30x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.45 67.76 135.58 273.04
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 9x 18x 37x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 4.38 8.76 17.54
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 7x 13x 27x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.65 4.41 8.83 17.67
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 7x 14x 27x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.68 5.14 10.28 20.56
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 8x 15x 30x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x T4 PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 0.90
NV-WRFg NRF Conus_2.5k_JA yes 1x 7x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.0.8

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Relion [Plasmodium Ribosome 2D] Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.62E+05 2.85E+04 1.59E+04 9.12E+03
Relion [Plasmodium Ribosome 2D] NRF Plasmodium Ribosome (2D) yes 1x 7x 10x 18x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2020_02

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 29,396 58,690 117,628
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 5x 10x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 5,893 11,690 23,309
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 3x 6x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 5,904 11,743 23,440
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 3x 6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_0caec104

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 1,866 208 105 56
SPECFEM3D NRF four_material_simple_model yes 1x 10x 20x 38x