For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

HPC Benchmarks

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x V100S PCIe, CUDA Version: CUDA 10.1.243 for CloverLeaf and with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.1.243 for MiniFE

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.1.243 for FUN3D

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.2.89

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.1.243 for NAMD; Dual Xeon Gold 6240@2.60GHz with 2x NVIDIA V100 SXM2, CUDA Version: CUDA 10.2 for RELION; Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.2 for LAMMPS, CUDA 10.2.89 for GROMACS and HOOMD-Blue

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.2.89 for Chroma, CUDA 10.0.130 for GTC, CUDA 10.1.243 for MILC; Dual Xeon E5-2698 v4@2.20GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 9.0.103 for QUDA

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: Dual Xeon Gold 6240@2.60GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.1.237 for Quantum Espresso; Dual Xeon E5-2698 v4@2.20GHz with 4x NVIDIA V100 SXM2, CUDA Version: CUDA 10.0.130 for VASP


Detailed V100 application performance data is located below in alphabetical order.

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.17-AT_19.9

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
AMBER [PME-Cellulose_NPT_4fs] ns/day DC-Cellulose_NPT yes 4.73 88.29 176.58 353.16 706.32 92.05 184.10 368.20 736.40
AMBER [PME-Cellulose_NPT_4fs] NRF DC-Cellulose_NPT yes 1x 19x 37x 75x 149x 19x 39x 78x 156x
AMBER [DC-Cellulose_NVE] ns/day PME-Cellulose_NVE yes 4.73 100 199 399 798 104 209 417 834
AMBER [DC-Cellulose_NVE] NRF PME-Cellulose_NVE yes 1x 21x 42x 84x 169x 22x 44x 88x 176x
AMBER [DC-FactorIX_NPT] ns/day FactorIX (NPT) yes 22.88 400 800 1,601 3,202 416 831 1,662 3,325
AMBER [DC-FactorIX_NPT] NRF FactorIX (NPT) yes 1x 17x 35x 70x 140x 18x 36x 73x 145x
AMBER [DC-FactorIX_NVE] ns/day FactorIX (NVE) yes 23.41 454 909 1,818 3,636 471 941 1,882 3,765
AMBER [DC-FactorIX_NVE] NRF FactorIX (NVE) yes 1x 19x 39x 78x 155x 20x 40x 80x 161x
AMBER [DC-JAC_NPT] ns/day DC-JAC_NPT yes 96.82 1,085 2,170 4,339 8,679 1,133 2,266 4,532 9,065
AMBER [DC-JAC_NPT] NRF DC-JAC_NPT yes 1x 11x 22x 45x 90x 12x 23x 47x 94x
AMBER [DC-JAC_NVE] ns/day DHFR (NVE) (AKA JAC) yes 98.50 1,217 2,433 4,867 9,734 1,274 2,549 5,097 10,195
AMBER [DC-JAC_NVE] NRF DHFR (NVE) (AKA JAC) yes 1x 12x 25x 49x 99x 13x 26x 52x 104x
AMBER [DC-STMV_NPT] ns/day STMV (NPT) yes 1.66 32 63 127 254 33 66 132 263
AMBER [DC-STMV_NPT] NRF STMV (NPT) yes 1x 19x 38x 76x 153x 20x 40x 79x 159x
AMBER [FEP-GTI_Complex 1fs] ns/day FEP-GTI_Complex yes 10.51 133 266 533 1,066 139 278 557 1,114
AMBER [FEP-GTI_Complex 1fs] NRF FEP-GTI_Complex yes 1x 13x 25x 51x 101x 13x 26x 53x 106x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2019.2

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,119 165 31 17 10 156 27 15
Chroma NRF szscl21_24_128 yes 1x 12x 66x 123x 207x 13x 75x 138x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3_master_0d385ec_fix1

ACCELERATED FEATURES

  • Lagrangian-Eulerian
  • explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
CloverLeaf Wall Clock (Sec) bm32 no 855 185 100 37 151 83 80 32
CloverLeaf NRF bm32 yes 1x NA 9x 24x 6x 11x 11x 28x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 96 49 26 17 86 44 23 16
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 6x 13x 24x 36x 7x 14x 26x 40x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 48.07 196 250 451 201 241 272 204 262 299
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 6x 7x 13x 6x 7x 8x 6x 8x 9x
GROMACS [Cellulose] ns/day Cellulose yes 13.62 57 96 146 53 74 78 60 86 -
GROMACS [Cellulose] NRF Cellulose yes 1x 5x 11x 17x 5x 8x 9x 5x 10x -
GROMACS [STMV] ns/day STMV yes 3.50 11.81 25.52 40.98 11.33 23.93 29.84 11.94 25.29 34.31
GROMACS [STMV] NRF STMV yes 1x 3x 7x 12x 3x 7x 9x 3x 7x 10x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.3

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
GTC Mpush/Sec moi#proc.in yes 33 234 433 840 1,449 243 456 870 1,441
GTC NRF moi#proc.in yes 1x 7x 13x 26x 45x 7x 14x 27x 44x

HOOMD-blue

Molecular Dynamics

Particle dynamics package written grounds up for GPUs

VERSION

V 2.8.0

ACCELERATED FEATURES

  • CPU & GPU versions available
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe
HOOMD-blue [microsphere] Ave. TPS microsphere yes 13.63 221.99 327.02 537.21 709.99 239.83 325.79 412.32
HOOMD-blue [microsphere] NRF microsphere yes 1x 19x 28x 46x 60x 20x 28x 35x

HPCG

Benchmark

Exercises computational and data access patterns that closely match a broad set of important HPC applications

VERSION

NA

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.hpcg-benchmark.org/index.html

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x V100 16GB PCIe 4x V100 16GB PCIe
HPCG GFLOPS 256x256x256 local size yes 31 293 576
HPCG NRF 256x256x256 local size yes 1x 9x 19x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

patch_20Nov2019

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.24E+08 2.92E+08 5.35E+08 1.16E+09 2.12E+09 2.92E+08 5.17E+08 1.08E+09 1.85E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 2x 5x 11x 20x 2x 5x 10x 17x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 6.15E+07 1.06E+08 2.22E+08 4.30E+08 7.63E+08 1.08E+08 2.19E+08 4.01E+08 6.94E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 4x 8x 15x 2x 4x 8x 14x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.55E+05 1.56E+06 2.72E+06 4.55E+06 7.00E+06 1.64E+06 2.85E+06 4.65E+06 7.25E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 4x 8x 13x 21x 4x 8x 14x 21x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 5.83E+07 2.12E+08 3.97E+08 7.74E+08 1.23E+09 2.24E+08 4.20E+08 7.23E+08 1.07E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 4x 6x 12x 18x 4x 6x 11x 16x

Linpack

Benchmark

Measures floating point computing power

VERSION

NA

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.top500.org/project/linpack/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x V100 16GB PCIe 4x V100 16GB PCIe
Linpack GFLOPS HPL.dat NB=[256] for GPU server NB=[192] for CPU server yes 2,176 10,090 19,880
Linpack NRF HPL.dat NB=[256] for GPU server NB=[192] for CPU server yes 1x 5x 9x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/quda-hisq-fusion_ca50f1ad

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
MILC Total Time (Sec) Apex Medium no 70,111 5,877 3,206 1,558 883 5,112 2,838 1,390 1,283
MILC NRF Apex Medium yes 1x 13x 24x 50x 87x 15x 27x 56x 60x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3_update01

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU

MORE INFORMATION

https://github.com/Mantevo/miniFE

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
MiniFE Total CG Time (Sec) 350x350x350 no 20.21 6.11 3.10 1.54 0.88 5.07 2.57 1.29 0.74
MiniFE NRF 350x350x350 yes 1x NA 6x 13x 23x 4x 8x 16x 27x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.10 60.82 72.06 82.82 55.22 76.37 88.18 66.18 77.46 90.28
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 9x 10x 12x 8x 11x 12x 9x 11x 13x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.10 63.61 79.48 93.99 56.56 81.07 98.65 68.41 84.49 102.19
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 9x 11x 13x 8x 11x 14x 10x 12x 14x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.40 67.85 83.94 97.24 61.53 88.49 103.28 74.64 89.95 107.33
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 9x 11x 13x 8x 12x 14x 10x 12x 15x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 5.12 6.52 7.24 5.14 7.44 8.70 6.09 7.76 8.55
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 8x 10x 11x 8x 11x 13x 9x 12x 13x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.65 5.26 6.89 7.70 5.19 7.88 9.41 6.44 8.16 9.22
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 8x 11x 12x 8x 12x 14x 10x 13x 14x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.66 5.68 7.43 8.55 5.65 8.55 10.11 6.97 8.86 9.71
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 16x 21x 24x 16x 24x 28x 20x 25x 27x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x V100 16GB PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 0.74
NV-WRFg NRF Conus_2.5k_JA yes 1x 8x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.1_gpu_pub

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724 276 140 90 84 267 136 96
Quantum Espressso NRF AUSURF112-jR yes 1x 3x 6x 9x 10x 3x 6x 8x

QUDA

Physics

A library for Lattice Quantum Chromo Dynamics on GPUs

VERSION

NA

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://usqcd-software.github.io/Level3.html#QUDA

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe
QUDA Dslash GFLOPS QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 yes 106 1,422 2,664 5,024 6,292 1,429 2,672 4,761 5,238
QUDA NRF QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 yes 1x 13x 25x 47x 59x 13x 25x 45x 49x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.0.7

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x RTX6000 2x RTX6000 4x RTX6000 8x RTX6000 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
RELION Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.80E+05 2.84E+04 1.56E+04 9.88E+03 6.79E+03 2.81E+04 1.55E+04 1.00E+04 6.65E+03 2.85E+04 1.58E+04 1.02E+04 6.77E+03
RELION NRF Plasmodium Ribosome (2D) yes 1x 8x 12x 18x 27x 8x 12x 18x 27x 8x 11x 18x 27x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2018_09

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 36,450 72,627 145,218 290,738 42,954 85,745 171,318 343,195
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 6x 13x 26x 4x 8x 15x 30x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 7,613 15,104 29,889 59,488 8,518 16,917 33,626 67,056
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 8x 16x 2x 4x 9x 18x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,592 15,100 29,859 59,550 8,514 16,934 33,623 67,039
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 8x 16x 2x 4x 9x 18x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_8926d3d3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 32GB SXM2 2x V100 32GB SXM2 4x V100 32GB SXM2 8x V100 32GB SXM2 1x V100S 32GB PCIe 2x V100S 32GB PCIe 4x V100S 32GB PCIe 8x V100S 32GB PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 2,114 159 83 43 25 133 69 37 22
SPECFEM3D NRF four_material_simple_model yes 1x 16x 30x 57x 98x 19x 36x 67x 113x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

V 5.4.4

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe
VASP [Si-Huge] Elapsed Time (Sec) Si-Huge no 3,535 1,959 1,702 1,342 1,869 1,595 1,125
VASP [Si-Huge] NRF Si-Huge yes 1x 2x 2x 3x 2x 2x 4x
VASP [B.hR105] Elapsed Time (Sec) B.hR105 no 408 204 125 84 201 123 80
VASP [B.hR105] NRF B.hR105 yes 1x 2x 3x 5x 2x 3x 5x

HPC Benchmarks

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe, CUDA Version: CUDA 10.1.243

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe, CUDA Version: CUDA 10.1.243

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz | GPU Server: same CPU server with 4x NVIDIA T4 PCIe, CUDA Version: CUDA 10.2.89

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe, CUDA Version: CUDA 10.2 for RELION;

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x NVIDIA T4 PCIe; CUDA Version: CUDA 10.2.89 for Chroma, CUDA 10.0.130 for GTC


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.17-AT_19.9

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [PME-Cellulose_NVE_4fs] ns/day DC-Cellulose_NVE yes 4.7 67 134 268
AMBER [PME-Cellulose_NVE_4fs] NRF DC-Cellulose_NVE yes 1x 14x 28x 57x
AMBER [PME-FactorIX_NPT_4fs] ns/day DC-FactorIX_NPT yes 23 304 608 1,215
AMBER [PME-FactorIX_NPT_4fs] NRF DC-FactorIX_NPT yes 1x 13x 27x 53x
AMBER [PME-JAC_NVE_4fs] ns/day DC-JAC_NVE yes 99 1,081 2,161 4,322
AMBER [PME-JAC_NVE_4fs] NRF DC-JAC_NVE yes 1x 11x 22x 44x
AMBER [PME-STMV_NPT_4fs] ns/day DC-STMV_NPT yes 1.7 23 45 90
AMBER [PME-STMV_NPT_4fs] NRF DC-STMV_NPT yes 1x 14x 27x 54x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

V 2019.2

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,119 116 38 25
Chroma NRF szscl21_24_128 yes 1x 18x 53x 82x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3_master_0d385ec_fix1

ACCELERATED FEATURES

  • Lagrangian-Eulerian
  • explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
CloverLeaf Wall Clock (Sec) bm32 no 855 437 225 121
CloverLeaf NRF bm32 yes 1x 2x 4x 7x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.6

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 526 281 142 74
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 2x 4x 8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2020

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 48 123 210
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 6x
GROMACS [Cellulose] ns/day Cellulose yes 14 39 57
GROMACS [Cellulose] NRF Cellulose yes 1x 2x 5x
GROMACS [STMV] ns/day STMV yes 4 10 17
GROMACS [STMV] NRF STMV yes 1x 3x 5x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

V 4.3

ACCELERATED FEATURES

  • Push, shift, and collision

SCALABILITY

Multi-GPU and Multi-Node

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 33 251 493 875
GTC NRF moi#proc.in yes 1x 8x 15x 27x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

feature/quda-hisq-fusion_ca50f1ad

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://ngc.nvidia.com/catalog/containers/hpc:milc

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Total Time (Sec) Apex Medium no 70,111 7,504 3,888 2,053
MILC NRF Apex Medium yes 1x 10x 20x 38x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3_update01

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU

MORE INFORMATION

https://github.com/Mantevo/miniFE

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MiniFE Total CG Time (Sec) 350x350x350 no 20.2 7.2 3.7 2
MiniFE NRF 350x350x350 yes 1x 3x 6x 10x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

V 2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 4x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.1 71
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 10x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.1 76
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 11x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.4 81
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 11x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.7 3
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 5x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.7 3
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 5x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.7 4
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 10x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x T4 PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.5 1.1
NV-WRFg NRF Conus_2.5k_JA yes 1x 5x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3.0.7

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RELION Total Wall Clock (Sec) Plasmodium Ribosome (2D) no 1.80E+05 2.65E+04 1.53E+04 9.09E+03
RELION NRF Plasmodium Ribosome (2D) yes 1x 8x 12x 20x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

nvidia_2018_09

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 29,423 58,824 117,827
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 3x 5x 10x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 5,900 11,744 23,449
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 3x 6x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 5,918 11,701 23,346
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 3x 6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

devel_8926d3d3

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 2,114 209 105 57
SPECFEM3D NRF four_material_simple_model yes 1x 12x 23x 43x