For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The Tesla V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

HPC Benchmarks

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.0.130 for CloverLeaf, CUDA 10.1.243 for MiniFE

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.105 for Abaqus/Standard, CUDA 9.0.176 for ANSYS Fluent, CUDA 10.0.130 for FUN3D

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.243 for RTM, CUDA 10.1.105 for SPECFEM3D

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.1.243

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.0.130, CUDA 9.0.103 for QUDA, CUDA 10.1.243 for MILC

Quantum Mechanics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.88, CUDA 10.0.130 for VASP


Detailed V100 application performance data is located below in alphabetical order.

Abaqus/Standard

Engineering

Simulation tool for analysis of structures

VERSION

2019

ACCELERATED FEATURES

  • Direct Sparse Solver
  • AMS Eigen Solver
  • Steady-state Dynamics Solver

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.3ds.com/products-services/simulia/products/abaqus/abaqusstandard/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2
Abaqus/Standard Total Time (Sec) LS-EPP-Combined-WC-Mkl (RR) no 3,309 2,767 1,855 1,477 2,941 1,973 1,635
Abaqus/Standard NRF LS-EPP-Combined-WC-Mkl (RR) yes 1x 1x 2x 2x 1x 2x 2x

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.17-AT

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
AMBER [DC-Cellulose_NVE] ns/day PME-Cellulose_NVE yes 4.73 100 199 398 797 106 212 424 847
AMBER [DC-Cellulose_NVE] NRF PME-Cellulose_NVE yes 1x 21x 42x 84x 168x 22x 45x 90x 179x
AMBER [DC-FactorIX_NPT] ns/day Factor IX (NPT) yes 22.88 391 782 1,563 3,126 415 831 1,661 3,322
AMBER [DC-FactorIX_NPT] NRF Factor IX (NPT) yes 1x 17x 34x 68x 137x 18x 36x 73x 145x
AMBER [DC-JAC_NVE] ns/day DHFR (NVE) (AKA JAC) yes 98.50 1,176 2,353 4,706 9,411 1,266 2,531 5,063 10,125
AMBER [DC-JAC_NVE] NRF DHFR (NVE) (AKA JAC) yes 1x 12x 24x 48x 96x 13x 26x 51x 103x
AMBER [DC-STMV_NPT] ns/day STMV (NPT) yes 1.66 32 64 129 257 36 67 134 268
AMBER [DC-STMV_NPT] NRF STMV (NPT) yes 1x 19x 39x 77x 155x 20x 40x 81x 161x

ANSYS Fluent

Engineering

General purpose software for the simulation of fluid dynamics

VERSION

19.2

ACCELERATED FEATURES

  • Pressure-based Coupled Solver and Radiation Heat Transfer

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.ansys.com/Products/Fluids/ANSYS-Fluent

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2
ANSYS Fluent Total Time (Sec) Waterjacket no 1,216 1,119 927 606 1,034 792 667
ANSYS Fluent NRF Waterjacket yes 1x 1x 1x 2x 1x 2x 2x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

2018

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
Chroma Total Time (Sec) szscl21_24_128 no 1,083 1,118 72 16 17 1,098 70 18 12
Chroma NRF szscl21_24_128 yes 1x 1x 27x 125x 114x 1x 28x 111x 163x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3

ACCELERATED FEATURES

  • Lagrangian-Eulerian
  • Explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2
CloverLeaf Wall Clock (Sec) bm32 no 855 - 93 90 - 94
CloverLeaf NRF bm32 yes 1x - 10x 10x - 9x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION

13.4

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 486 98 50 27 18 94 49 25 19
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 6x 11x 21x 31x 6x 12x 22x 30x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2019.4

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2
GROMACS [ADH Dodec] ns/day ADH Dodec yes 48.21 155 178 188 149 169 187
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 5x 6x 6x 5x 5x 6x
GROMACS [Cellulose] ns/day Cellulose yes 12.79 44 49 52 41 53 50
GROMACS [Cellulose] NRF Cellulose yes 1x 5x 5x 5x 4x 5x 5x
GROMACS [STMV] ns/day STMV yes 2.63 10 16 12 10 14 14
GROMACS [STMV] NRF STMV yes 1x 4x 6x 5x 4x 6x 6x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

4.3

ACCELERATED FEATURES

  • Push, shift, and collision
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
GTC Mpush/Sec moi#proc.in yes 33 222 412 776 1,330 237 426 817 1,561
GTC NRF moi#proc.in yes 1x 7x 13x 24x 41x 7x 13x 25x 48x

HOOMD-Blue

Molecular Dynamics

Particle dynamics package written grounds up for GPUs

VERSION

2.5.2

ACCELERATED FEATURES

  • CPU & GPU versions available
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2 1x V100 32GB PCIe 2x V100 32GB PCIe 4x V100 32GB PCIe 8x V100 32GB PCIe
HOOMD-Blue Ave. TPS microsphere yes 14.0 225 306 391 203 234 323 496 690 215 294 387 203
HOOMD-Blue NRF microsphere yes 1x 18x 25x 31x 16x 19x 26x 40x 56x 17x 24x 31x 16x

HPCG

Benchmark

Exercises computational and data access patterns that closely match a broad set of important HPC applications

VERSION

3

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.hpcg-benchmark.org/index.html

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe
HPCG GFLOPS 256x256x256 local size yes 31 - 293 576
HPCG NRF 256x256x256 local size yes 1x - 9x 19x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

stable_5Jun2019

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
LAMMPS [LJ 2.5] ATOM-Time Steps/s LJ 2.5 yes 1.13E+08 3.00E+08 5.37E+08 1.03E+09 1.75E+09 3.10E+08 5.63E+08 1.18E+09 2.16E+09
LAMMPS [LJ 2.5] NRF LJ 2.5 yes 1x 3x 6x 11x 18x 3x 6x 12x 23x
LAMMPS [EAM] ATOM-Time Steps/s EAM yes 5.69E+07 1.08E+08 2.24E+08 4.02E+08 6.81E+08 1.15E+08 2.43E+08 4.34E+08 7.90E+08
LAMMPS [EAM] NRF EAM yes 1x 2x 5x 8x 14x 2x 5x 9x 17x
LAMMPS [ReaxFF/C] ATOM-Time Steps/s ReaxFF/C yes 4.61E+05 1.61E+06 2.80E+06 4.59E+06 7.01E+06 1.74E+06 2.88E+06 4.62E+06 7.18E+06
LAMMPS [ReaxFF/C] NRF ReaxFF/C yes 1x 4x 8x 14x 21x 5x 9x 14x 21x
LAMMPS [Tersoff] ATOM-Time Steps/s Tersoff yes 4.67E+07 1.94E+08 3.76E+08 6.70E+08 1.00E+09 2.26E+08 4.27E+08 7.85E+08 1.29E+09
LAMMPS [Tersoff] NRF Tersoff yes 1x 4x 8x 14x 20x 5x 9x 16x 26x

Linpack

Benchmark

Measures floating point computing power

VERSION

2.1

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.top500.org/project/linpack/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe
Linpack GFLOPS HPL.dat NB=[256] for GPU server NB=[192] for CPU server yes 2,176 - 10,090 19,880
Linpack NRF HPL.dat NB=[256] for GPU server NB=[192] for CPU server yes 1x - 5x 9x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

2019

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
MILC Total Time (Sec) Apex Medium no 70,111 - 3,072 1,589 - 3,168 1,572 913
MILC NRF Apex Medium yes 1x - 25x 49x - 24x 49x 85x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3

ACCELERATED FEATURES

  • All
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
MiniFE Total CG Time (Sec) 350x350x350 no 20.21 5.70 2.98 1.47 0.81 5.75 2.91 1.45 0.82
MiniFE NRF 350x350x350 yes 1x 3x 7x 14x 25x 3x 7x 14x 25x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.10 61.62 73.35 86.77 60.67 72.42 82.67
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 9x 10x 12x 9x 10x 12x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.10 63.44 81.72 96.82 63.82 79.96 93.70
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 9x 12x 14x 9x 11x 13x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.40 68.88 84.33 103.0 67.89 83.51 97.31
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 9x 11x 14x 9x 11x 13x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.65 6.00 7.62 8.39 5.10 6.62 7.26
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 9x 12x 13x 8x 10x 11x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.65 6.18 8.13 9.54 5.29 6.83 7.75
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 10x 13x 15x 8x 11x 12x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.66 6.79 8.72 9.77 5.70 7.49 8.52
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 19x 24x 27x 16x 21x 24x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

NV-WRFg 3.8.1

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 8x V100 16GB PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.51 - 0.74
NV-WRFg NRF Conus_2.5k_JA yes 1x - 8x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.1

ACCELERATED FEATURES

  • Linear algebra (matix multiply)
  • Explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
Quantum Espressso Total CPU Time (Sec) AUSURF112-jR no 724.0 - 200 99 90 - 190 94 79
Quantum Espressso NRF AUSURF112-jR yes 1x - 4x 8x 9x - 4x 9x 10x

QUDA

Physics

A library for Lattice Quantum Chromo Dynamics on GPUs

VERSION

2017

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://usqcd-software.github.io/Level3.html#QUDA

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
QUDA Dslash GFLOPS QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 yes 106 1,429 2,672 4,761 5,238 1,422 2,664 5,024 6,292
QUDA NRF QPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64 yes 1x 13x 25x 45x 49x 13x 25x 47x 59x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2
RELION 1/Minutes Plasmodium Ribosime on Relion-3.0 yes 2.47E-03 9.24E-03 1.47E-02 1.63E-02 9.32E-03 1.47E-02 1.59E-02
RELION NRF Plasmodium Ribosime on Relion-3.0 yes 1x 4x 6x 7x 4x 6x 6x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

2018

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 41,778 82,904 166,598 329,594 41,632 82,990 165,935 331,904
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 4x 7x 15x 29x 4x 7x 15x 29x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 7,699 15,308 30,442 60,752 8,345 16,496 32,909 65,521
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 2x 4x 8x 16x 2x 4x 9x 17x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 7,781 15,475 30,858 61,680 7,742 15,378 30,584 60,984
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 2x 4x 8x 16x 2x 4x 8x 16x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

github_a2d23d27

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 8x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
SPECFEM3D Total Time (Sec) four_material_simple_model no 2,114 149 77 41 24 148 77 41 24
SPECFEM3D NRF four_material_simple_model yes 1x 17x 32x 60x 103x 17x 32x 61x 105x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

5.4.4

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL & FAST), RMM-DIIS (ALGO = VERYFAST & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

www.vasp.at

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 1x V100 16GB PCIe 2x V100 16GB PCIe 4x V100 16GB PCIe 1x V100 16GB SXM2 2x V100 16GB SXM2 4x V100 16GB SXM2 8x V100 16GB SXM2
VASP [Si-Huge] Elapsed Time (Sec) Si-Huge no 3,535 1,869 1,595 1,125 1,959 1,702 1,342 1,331
VASP [Si-Huge] NRF Si-Huge yes 1x 2x 2x 4x 2x 2x 3x 3x
VASP [B.hR105] Elapsed Time (Sec) B.hR105 no 408 201 123 80 204 125 84 75
VASP [B.hR105] NRF B.hR105 yes 1x 2x 3x 5x 2x 3x 5x 5x

HPC Benchmarks

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243;

Engineering

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.0.130;

Geoscience

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243

Physics

CPU Server: Dual Xeon Gold 6240@2.60GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.1.243, CUDA 10.0.130 for GTC


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.17-AT

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
AMBER [DC-Cellulose_NVE] ns/day PME-Cellulose_NVE yes 4.7 65 130 261
AMBER [DC-Cellulose_NVE] NRF PME-Cellulose_NVE yes 1x 14x 28x 55x
AMBER [DC-FactorIX_NPT] ns/day Factor IX (NPT) yes 23 299 598 1,197
AMBER [DC-FactorIX_NPT] NRF Factor IX (NPT) yes 1x 13x 26x 52x
AMBER [DC-JAC_NVE] ns/day DHFR (NVE) (AKA JAC) yes 99 1,043 2,085 4,171
AMBER [DC-JAC_NVE] NRF DHFR (NVE) (AKA JAC) yes 1x 11x 21x 42x
AMBER [DC-STMV_NPT] ns/day STMV (NPT) yes 1.7 22 44 89
AMBER [DC-STMV_NPT] NRF STMV (NPT) yes 1x 13x 27x 53x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

2018

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
Chroma Total Time (Sec) szscl21_24_128 no 1,083 119 38 22
Chroma NRF szscl21_24_128 yes 1x 17x 52x 90x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3

ACCELERATED FEATURES

  • Lagrangian-Eulerian explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe
CloverLeaf Wall Clock (Sec) bm32 no 855 437
CloverLeaf NRF bm32 yes 1x 2x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.4

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
FUN3D Loop Time (Sec) dpw_wbt0_crs-3.6Mn_5 no 486 29 139 72
FUN3D NRF dpw_wbt0_crs-3.6Mn_5 yes 1x 20x 4x 8x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2019.4

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
GROMACS [ADH Dodec] ns/day ADH Dodec yes 48 188 168
GROMACS [ADH Dodec] NRF ADH Dodec yes 1x 3x 5x
GROMACS [Cellulose] ns/day Cellulose yes 13 52 43
GROMACS [Cellulose] NRF Cellulose yes 1x 2x 4x
GROMACS [STMV] ns/day STMV yes 2.6 16 10
GROMACS [STMV] NRF STMV yes 1x 7x 4x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

4.3

ACCELERATED FEATURES

  • Push, shift, and collision
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
GTC Mpush/Sec moi#proc.in yes 33 789 493 875
GTC NRF moi#proc.in yes 1x 24x 15x 27x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

2019

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MILC Total Time (Sec) Apex Medium no 70,111 1,603 3,888 2,053
MILC NRF Apex Medium yes 1x 48x 20x 38x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3

ACCELERATED FEATURES

  • All
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
MiniFE Total CG Time (Sec) 350x350x350 no 20.2 1.5 3.7 2.0
MiniFE NRF 350x350x350 yes 1x 13x 6x 10x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe
NAMD [apoa1_npt_cuda] Ave ns/day apoa1_npt_cuda yes 7.1 87 71
NAMD [apoa1_npt_cuda] NRF apoa1_npt_cuda yes 1x 12x 10x
NAMD [apoa1_nptsr_cuda] Ave ns/day apoa1_nptsr_cuda yes 7.1 97 75
NAMD [apoa1_nptsr_cuda] NRF apoa1_nptsr_cuda yes 1x 14x 11x
NAMD [apoa1_nve_cuda] Ave ns/day apoa1_nve_cuda yes 7.4 102 81
NAMD [apoa1_nve_cuda] NRF apoa1_nve_cuda yes 1x 14x 11x
NAMD [stmv_npt_cuda] Ave ns/day stmv_npt_cuda yes 0.7 8 3
NAMD [stmv_npt_cuda] NRF stmv_npt_cuda yes 1x 13x 5x
NAMD [stmv_nptsr_cuda] Ave ns/day stmv_nptsr_cuda yes 0.7 9 3
NAMD [stmv_nptsr_cuda] NRF stmv_nptsr_cuda yes 1x 14x 5x
NAMD [stmv_nve_cuda] Ave ns/day stmv_nve_cuda yes 0.7 10 4
NAMD [stmv_nve_cuda] NRF stmv_nve_cuda yes 1x 28x 10x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

NV-WRFg 3.8.1

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 8x T4 PCIe
NV-WRFg Seconds / Timestamps Conus_2.5k_JA no 5.5 1.1
NV-WRFg NRF Conus_2.5k_JA yes 1x 5x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RELION 1/Minutes Plasmodium Ribosime on Relion-3.0 yes 2.47E-03 1.23E-02 1.57E-02 1.66E-02
RELION NRF Plasmodium Ribosime on Relion-3.0 yes 1x 5x 6x 7x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

2018

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
RTM [Isotropic Radius 4] Mcells/s Isotropic Radius 4 yes 11,318 148,708 58,700 117,772
RTM [Isotropic Radius 4] NRF Isotropic Radius 4 yes 1x 13x 5x 10x
RTM [TTI Radius 8 1-pass] Mcells/s TTI Radius 8 1-pass yes 3,773 28,787 11,697 23,297
RTM [TTI Radius 8 1-pass] NRF TTI Radius 8 1-pass yes 1x 8x 3x 6x
RTM [TTI RX 2Pass mgpu] Mcells/s TTI RX 2Pass mgpu yes 3,773 28,625 11,732 23,417
RTM [TTI RX 2Pass mgpu] NRF TTI RX 2Pass mgpu yes 1x 8x 3x 6x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

dvel_b7ed7a33

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

Application Metric Test Modules Bigger is better Dual Cascade Lake 6240 (CPU-Only) 2x T4 PCIe 4x T4 PCIe 8x T4 PCIe
SPECFEM3D Total Time (Sec) four_material_simple_model no 2,114 44 106 57
SPECFEM3D NRF four_material_simple_model yes 1x 56x 23x 44x