For Deep Learning performance, please go here.


Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The Tesla V100 and T4 GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.

The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.

HPC Benchmarks

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148;

Engineering

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla V100 PCIe or Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148

Geoscience

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 10.0.1

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148, CUDA 10.0.1 for LAMPS and NAMD

Physics

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148, CUDA 10.0.1 for MILC

Quantum Mechanics

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148

Weather and Climate

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: Dual Xeon E5-2698 v4@2.20GHz with 4x Tesla V100 SXM2; CUDA Version: CUDA 9.2.148


Detailed V100 application performance data is located below in alphabetical order.

Abaqus/Standard

Engineering

Simulation tool for analysis of structures

VERSION

2017

ACCELERATED FEATURES

  • Direct Sparse Solver
  • AMS Eigen Solver
  • Steady-state Dynamics Solver
ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
Abaqus/StandardTotal Time (Sec)LS-EPP-Combined-WC-Mkl (RR)no4,3761,8561,4791,8561,479
Abaqus/StandardNRFLS-EPP-Combined-WC-Mkl (RR)yes1x4x9x4x9x

AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.13-AT

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
AMBER [DC-Cellulose_NVE]ns/dayPME-Cellulose_NVEyes4.5192384768204408816
AMBER [DC-Cellulose_NVE]NRFPME-Cellulose_NVEyes1x43x85x171x45x91x181x
AMBER [DC-FactorIX_NPT]ns/dayFactor IX (NPT)yes227601,5203,0408101,6203,240
AMBER [DC-FactorIX_NPT]NRFFactor IX (NPT)yes1x35x69x138x37x74x147x
AMBER [DC-JAC_NVE]ns/dayDHFR (NVE) (AKA JAC)yes932,1924,3848,7682,4464,8929,784
AMBER [DC-JAC_NVE]NRFDHFR (NVE) (AKA JAC)yes1x24x47x94x26x53x105x
AMBER [DC-STMV_NPT]ns/daySTMV (NPT) yes1.66012024066132264
AMBER [DC-STMV_NPT]NRFSTMV (NPT) yes1x38x75x150x41x83x165x

ANSYS Fluent

Engineering

General purpose software for the simulation of fluid dynamics

VERSION

19.2

ACCELERATED FEATURES

  • Pressure-based Coupled Solver and Radiation Heat Transfer

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.ansys.com/Products/Fluids/ANSYS-Fluent

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
ANSYS FluentTotal Time (Sec)Waterjacketno1,222764591812669
ANSYS FluentNRFWaterjacketyes1x3x4x3x3x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

2018

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
ChromaTotal Time (Sec)szscl21_24_128no1,140751918811914
ChromaNRFszscl21_24_128yes1x28x109x116x26x109x149x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3

ACCELERATED FEATURES

  • Lagrangian-Eulerian
  • explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
CloverLeafWall Clock (Sec)bm32no2,520120104113100
CloverLeafNRFbm32yes1x16x18x17x19x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.3

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no612502618492518
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x14x26x38x14x27x38x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2019.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
GROMACS [ADH Dodec]ns/dayADH Dodecyes54176193175201
GROMACS [ADH Dodec]NRFADH Dodecyes1x5x9x5x9x
GROMACS [Cellulose]ns/dayCelluloseyes1550555453
GROMACS [Cellulose]NRFCelluloseyes1x5x5x5x5x
GROMACS [STMV -"Puregpu" Sandbox]ns/daySTMVyes3.5--1534
GROMACS [STMV -"Puregpu" Sandbox]NRFSTMVyes1x--5x11x
GROMACS [STMV]ns/daySTMVyes3.516141515
GROMACS [STMV]NRFSTMVyes1x5x4x5x5x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

4.3

ACCELERATED FEATURES

  • Push, shift, and collision
ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
GTCMpush/Secmoi#proc.inyes354398461,4304578771,667
GTCNRFmoi#proc.inyes1x13x25x41x13x25x48x

HOOMD-Blue

Molecular Dynamics

Particle dynamics package written grounds up for GPUs

VERSION

2.2.2

ACCELERATED FEATURES

  • CPU & GPU versions available

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://codeblue.umich.edu/hoomd-blue/index.html

https://ngc.nvidia.com/catalog/containers/hpc:hoomd-blue

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM22x V100 32GB PCIe4x V100 32GB PCIe8x V100 32GB PCIe
HOOMD-BlueAve. TPSmicrosphereyes11.892298371467329506689283353444
HOOMD-BlueNRFmicrosphereyes1x28x35x44x31x48x65x27x33x42x

HPCG

Benchmark

Exercises computational and data access patterns that closely match a broad set of important HPC applications

VERSION

3

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.hpcg-benchmark.org/index.html

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
HPCGGFLOPS256x256x256 local sizeyes262935761,0562935761,056
HPCGNRF256x256x256 local sizeyes1x11x22x41x11x22x41x

LAMMPS

Molecular Dynamics

Classical molecular dynamics package

VERSION

2018_03_16_stable

ACCELERATED FEATURES

  • Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://lammps.sandia.gov/index.html

https://ngc.nvidia.com/catalog/containers/hpc:lammps

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes9.66E+074.89E+089.85E+081.67E+095.17E+081.13E+092.05E+09
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x6x11x18x6x12x23x
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.18E+071.47E+082.75E+084.63E+081.61E+083.05E+085.36E+08
LAMMPS [EAM]NRFEAMyes1x3x6x11x4x7x12x
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes3.33E+052.25E+063.77E+065.95E+062.38E+063.87E+066.07E+06
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x15x25x39x15x25x39x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes3.92E+073.87E+086.81E+081.04E+094.29E+087.83E+081.18E+09
LAMMPS [Tersoff]NRFTersoffyes1x10x18x27x11x20x31x

Linpack

Benchmark

Measures floating point computing power

VERSION

2.1

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.top500.org/project/linpack/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
LinpackGFLOPSHPL.dat NB=[256] for GPU server NB=[192] for CPU serveryes1,81310,09019,88010,09019,880
LinpackNRFHPL.dat NB=[256] for GPU server NB=[192] for CPU serveryes1x5x11x5x11x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

2019

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/milc/

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
MILCTotal Time (Sec)Apex Mediumno72,1943,0981,5343,1041,530895
MILCNRFApex Mediumyes1x27x55x27x55x94x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3

ACCELERATED FEATURES

  • All
ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
MiniFETotal CG Time (Sec)350x350x350no21.352.941.440.822.941.440.82
MiniFENRF350x350x350yes1x7x15x26x7x15x26x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes4.1070.6883.4071.3582.55
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x123x145x124x144x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes4.1078.4393.8478.7393.43
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x136x163x137x162x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes4.4081.7299.4681.6396.87
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x95x115x95x112x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.387.258.146.567.26
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x32x36x29x32x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.388.018.956.917.75
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x38x42x33x36x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.388.589.607.448.54
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x34x38x30x34x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

NV-WRFg 3.7.1

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)4x V100 16GB PCIe8x V100 16GB PCIe4x V100 16GB SXM28x V100 16GB SXM2
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.200.680.520.620.38
NV-WRFgNRFConus_2.5k_JAyes1x9x11x9x15x

Quantum Espresso

Material Science (Quantum Chemistry)

An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION

6.1

ACCELERATED FEATURES

  • linear algebra (matix multiply)
  • explicit computational kernels
  • 3D FFTs

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
Quantum EspresssoTotal CPU Time (Sec)AUSURF112-jRno73620099951909477
Quantum EspresssoNRFAUSURF112-jRyes1x10x20x21x11x21x26x

QUDA

Physics

A library for Lattice Quantum Chromo Dynamics on GPUs

VERSION

2017

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://usqcd-software.github.io/Level3.html#QUDA

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
QUDADslash GFLOPSQPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64yes1122,6724,7615,2382,6645,0246,292
QUDANRFQPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64yes1x31x56x62x31x59x74x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM2
RELION1/MinutesPlasmodium Ribosime on Relion-2.1yes1.46E-031.47E-021.61E-021.42E-021.61E-02
RELIONNRFPlasmodium Ribosime on Relion-2.1yes1x11x12x10x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

2018

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes9,43283,292166,588333,21682,979165,942331,945
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x9x18x35x9x18x35x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,14415,48830,90261,63016,64232,87965,578
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x5x10x20x5x10x21x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,14415,31430,45960,76915,37830,58560,990
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x5x10x19x5x10x19x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

github_a2d23d27

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe8x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
SPECFEM3DTotal Time (Sec)four_material_simple_modelno2,807774124774123
SPECFEM3DNRFfour_material_simple_modelyes1x43x82x141x44x83x143x

VASP

Material Science (Quantum Chemistry)

Complex package for performing ab-initio quantum-mechanical molecular dynamics (MD) simulations using pseudopotentials or the projector-augmented wave method and a plane wave basis set

VERSION

5.4.4

ACCELERATED FEATURES

  • Blocked Davidson (ALGO = NORMAL &
  • FAST), RMM-DIIS (ALGO = VERYFAST
  • & FAST), K-Points and optimization

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

www.vasp.at

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x V100 16GB PCIe4x V100 16GB PCIe2x V100 16GB SXM24x V100 16GB SXM28x V100 16GB SXM2
VASP [Si-Huge]Elapsed Time (Sec)Si-Hugeno4,3642,0941,5912,2631,7711,493
VASP [Si-Huge]NRFSi-Hugeyes1x4x9x3x9x10x
VASP [B.hR105]Elapsed Time (Sec)B.hR105no661123801258475
VASP [B.hR105]NRFB.hR105yes1x22x34x22x33x36x

HPC Benchmarks

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 9.2.148;

Engineering

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 9.2.148;

Geoscience

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 10.0.1

Microscopy and Molecular Dynamics

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 9.2.148, CUDA 10.0.1 for NAMD

Physics

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 9.2.148, CUDA 10.0.1 for MILC

Weather and Climate

CPU Server: Dual Xeon Gold 6140@2.30GHz, GPU Server: same CPU server with 4x Tesla T4 PCIe; CUDA Version: CUDA 9.2.148


Detailed T4 application performance data is located below in alphabetical order.


AMBER

Molecular Dynamics

Suite of programs to simulate molecular dynamics on biomolecule

VERSION

18.13-AT

ACCELERATED FEATURES

  • PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://ambermd.org/GPUSupport.php

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
AMBER [DC-Cellulose_NVE]ns/dayPME-Cellulose_NVEyes4.566132264
AMBER [DC-Cellulose_NVE]NRFPME-Cellulose_NVEyes1x15x29x59x
AMBER [DC-FactorIX_NPT]ns/dayFactor IX (NPT)yes222965921,184
AMBER [DC-FactorIX_NPT]NRFFactor IX (NPT)yes1x13x27x54x
AMBER [DC-JAC_NVE]ns/dayDHFR (NVE) (AKA JAC)yes931,0382,0764,152
AMBER [DC-JAC_NVE]NRFDHFR (NVE) (AKA JAC)yes1x11x22x45x
AMBER [DC-STMV_NPT]ns/daySTMV (NPT) yes1.6224488
AMBER [DC-STMV_NPT]NRFSTMV (NPT) yes1x14x28x55x

Chroma

Physics

Lattice Quantum Chromodynamics (LQCD)

VERSION

2018

ACCELERATED FEATURES

  • Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
ChromaTotal Time (Sec)szscl21_24_128no1,1401013926
ChromaNRFszscl21_24_128yes1x21x53x80x

CloverLeaf

Benchmark

Hydrodynamics

VERSION

1.3

ACCELERATED FEATURES

  • Lagrangian-Eulerian
  • explicit hydrodynamics mini-application

SCALABILITY

Multi-Node (MPI)

MORE INFORMATION

https://uk-mac.github.io/CloverLeaf/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe
CloverLeafWall Clock (Sec)bm32no2,5201,002
CloverLeafNRFbm32yes1x3x

FUN3D

Engineering

Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow.

VERSION

13.3

ACCELERATED FEATURES

  • Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no61226813268
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x2x5x10x

GROMACS

Molecular Dynamics

Simulation of biochemical molecules with complicated bond interactions

VERSION

2019.1

ACCELERATED FEATURES

  • Implicit (5x), Explicit (2x) Solvent

SCALABILITY

Multi-GPU, Single Node

MORE INFORMATION

http://www.gromacs.org

https://ngc.nvidia.com/catalog/containers/hpc:gromacs

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe
GROMACS [ADH Dodec]ns/dayADH Dodecyes54129153
GROMACS [ADH Dodec]NRFADH Dodecyes1x3x5x
GROMACS [Cellulose]ns/dayCelluloseyes153442
GROMACS [Cellulose]NRFCelluloseyes1x2x3x
GROMACS [STMV]ns/daySTMVyes3.59
GROMACS [STMV]NRFSTMVyes1x3x3x

GTC

Physics

GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas.

VERSION

4.3

ACCELERATED FEATURES

  • Push, shift, and collision
ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
GTCMpush/Secmoi#proc.inyes35264523928
GTCNRFmoi#proc.inyes1x8x15x27x

HPCG

Benchmark

Exercises computational and data access patterns that closely match a broad set of important HPC applications

VERSION

3

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.hpcg-benchmark.org/index.html

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
HPCGGFLOPS256x256x256 local sizeyes26117230422
HPCGNRF256x256x256 local sizeyes1x4x9x16x

MILC

Physics

Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION

2019

ACCELERATED FEATURES

  • Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/milc/

https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MILCTotal Time (Sec)Apex Mediumno72,1947,4983,7632,053
MILCNRFApex Mediumyes1x11x22x41x

MiniFE

Benchmark

Finite Element Analysis

VERSION

0.3

ACCELERATED FEATURES

  • All
ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
MiniFETotal CG Time (Sec)350x350x350no21.37.23.61.9
MiniFENRF350x350x350yes1x3x6x11x

NAMD

Molecular Dynamics

Designed for high-performance simulation of large molecular systems

VERSION

2.13

ACCELERATED FEATURES

  • Full electrostatics with PME and most simulation features

SCALABILITY

Up to 100M atom capable, multi-GPU,

MORE INFORMATION

http://www.ks.uiuc.edu/Research/namd/

https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes4.15170
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x89x123x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes4.15275
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x91x130x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes4.45780
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x66x93x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes0.457
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x20x32x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes0.457
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x21x35x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes0.458
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x19x31x

NV-WRFg

Numerical Weather Prediction

Numerical weather prediction system designed for both atmospheric research and operational forecasting applications

VERSION

NV-WRFg 3.7.1

ACCELERATED FEATURES

  • Dynamics modules
  • Several Physics modules

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://wrfg.net/wrfg-description/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)4x T4 PCIe8x T4 PCIe
NV-WRFgSeconds / TimestampsConus_2.5k_JAno5.21.81.0
NV-WRFgNRFConus_2.5k_JAyes1x3x6x

QUDA

Physics

A library for Lattice Quantum Chromo Dynamics on GPUs

VERSION

2017

ACCELERATED FEATURES

  • All

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

http://usqcd-software.github.io/Level3.html#QUDA

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
QUDADslash GFLOPSQPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64yes1121,9842,3193,627
QUDANRFQPhil Dslash Wilson-Clover Precision: Single; Gauge Compression/Recon: 12; Problem Size 32x32x32x64yes1x23x27x43x

RELION

Microscopy

Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION

3

ACCELERATED FEATURES

  • Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY

Multi-GPU and Single Node

MORE INFORMATION

https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RELION1/MinutesPlasmodium Ribosime on Relion-2.1yes1.46E-031.19E-021.54E-021.67E-02
RELIONNRFPlasmodium Ribosime on Relion-2.1yes1x9x11x12x

RTM

Geoscience

Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION

2018

ACCELERATED FEATURES

  • Batch algorithm

SCALABILITY

Multi-GPU and Multi-Node

MORE INFORMATION

http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes9,43229,44658,948117,570
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x3x6x12x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,1445,98511,77423,322
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x2x4x7x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,1445,91411,74323,422
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x2x4x7x

SPECFEM3D

Geoscience

Simulates Seismic wave propagation

VERSION

github_a2d23d27

ACCELERATED FEATURES

  • OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY

Multi-GPU and Single-Node

MORE INFORMATION

https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Skylake (CPU-Only)2x T4 PCIe4x T4 PCIe8x T4 PCIe
SPECFEM3DTotal Time (Sec)four_material_simple_modelno2,80720810556
SPECFEM3DNRFfour_material_simple_modelyes1x16x32x60x