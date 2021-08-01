NVIDIA HPC Application Performance
For Deep Learning performance, please go here.
Modern HPC data centers are key to solving some of the world’s most important scientific and engineering challenges. The NVIDIA Data Center GPUs fundamentally change the economics of the data center, delivering breakthrough performance with dramatically fewer servers, less power consumption, and reduced networking overhead, resulting in total cost savings of 5X-10X.
The number of CPU-only servers replaced by a single GPU-accelerated server is called the node replacement factor (NRF). To arrive at NRF, we measure application performance with up to 8 CPU-only servers. Then we use linear scaling to scale beyond 8 servers to calculate the NRF. The NRF will vary by application.
Detailed H100 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|318
|633
|1,264
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|77x
|153x
|306x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|313
|649
|1,260
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|76x
|157x
|306x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|1,326
|2,680
|5,330
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|64x
|129x
|257x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|1,356
|2,721
|5,416
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|65x
|130x
|259x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|4,540
|9,197
|17,967
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|54x
|109x
|212x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|4,590
|9,301
|20,152
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|54x
|109x
|237x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|85
|170
|340
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|62x
|123x
|246x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|194
|388
|776
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|20x
|39x
|78x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|30
|17
|10
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|21x
|37x
|59x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|626
|723
|896
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|12x
|14x
|18x
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|189
|246
|350
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|14x
|19x
|27x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|42
|68
|115
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|10x
|17x
|28x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|758
|1,370
|2,441
|GTC
|NRF
|moi#proc.in
|yes
|1x
|22x
|40x
|71x
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.5_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|ICON [SLAM 191 - 160KM - no radiation]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution without radiation
|no
|2,431
|204
|149
|113
|ICON [SLAM 191 - 160KM - no radiation]
|NRF
|SLAM 191 levels 160 km resolution without radiation
|yes
|1x
|12x
|16x
|22x
|ICON [QUBICC 160 km resolution]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution with radiation
|no
|2,213
|188
|133
|101
|ICON [QUBICC 160 km resolution]
|NRF
|SLAM 191 levels 160 km resolution with radiation
|yes
|1x
|12x
|17x
|22x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|LAMMPS [LJ 2.5]
|ATOM-Time Steps/s
|LJ 2.5
|yes
|1.11E+08
|1.07E+09
|1.93E+09
|3.35E+09
|LAMMPS [LJ 2.5]
|NRF
|LJ 2.5
|yes
|1x
|10x
|18x
|31x
|LAMMPS [EAM]
|ATOM-Time Steps/s
|EAM
|yes
|5.33E+07
|5.13E+08
|9.12E+08
|1.60E+09
|LAMMPS [EAM]
|NRF
|EAM
|yes
|1x
|10x
|18x
|31x
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|1.02E+07
|1.84E+07
|3.04E+07
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|31x
|57x
|94x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|3.87E+06
|7.69E+06
|1.52E+07
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|37x
|74x
|147x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|9.16E+08
|1.64E+09
|2.95E+09
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|34x
|60x
|108x
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|1,172
|634
|355
|MILC
|NRF
|Apex Medium
|yes
|1x
|67x
|124x
|222x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|284
|549
|1,048
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|15x
|29x
|55x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|291
|570
|1,124
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|15x
|29x
|57x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|363
|698
|1,386
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|17x
|34x
|67x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|23
|45
|89
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|12x
|24x
|48x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|23
|46
|92
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|13x
|26x
|51x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|27
|54
|108
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|14x
|28x
|56x
RTM
Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2021_05
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|RTM [Isotropic Radius 4]
|Mcells/s
|Isotropic Radius 4
|yes
|11,318
|124,975
|249,251
|498,066
|RTM [Isotropic Radius 4]
|NRF
|Isotropic Radius 4
|yes
|1x
|11x
|22x
|44x
|RTM [TTI Radius 8 1-pass]
|Mcells/s
|TTI Radius 8 1-pass
|yes
|3,773
|22,109
|44,135
|88,094
|RTM [TTI Radius 8 1-pass]
|NRF
|TTI Radius 8 1-pass
|yes
|1x
|6x
|12x
|23x
|RTM [TTI RX 2Pass mgpu]
|Mcells/s
|TTI RX 2Pass mgpu
|yes
|3,773
|21,704
|43,088
|85,784
|RTM [TTI RX 2Pass mgpu]
|NRF
|TTI RX 2Pass mgpu
|yes
|1x
|6x
|11x
|23x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x H100 SXM
|2x H100 SXM
|4x H100 SXM
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|46
|24
|14
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|32x
|59x
|105x
Detailed L40 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L40
|2x L40
|4x L40
|8x L40
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|779
|1,575
|3,147
|6,303
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|38x
|76x
|152x
|304x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|797
|1,613
|3,182
|6,428
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|38x
|77x
|152x
|307x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|3,270
|6,561
|12,901
|26,316
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|39x
|78x
|152x
|311x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|3,297
|6,656
|13,250
|26,477
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|39x
|78x
|156x
|311x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|62
|124
|248
|497
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|45x
|90x
|180x
|360x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L40
|2x L40
|4x L40
|8x L40
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|119
|61
|32
|19
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|5x
|10x
|19x
|32x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L40
|2x L40
|4x L40
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|566
|-
|-
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|11x
|-
|-
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|161
|-
|212
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|12x
|-
|16x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|33
|55
|81
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|8x
|13x
|20x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L40
|2x L40
|4x L40
|8x L40
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|1.51E+06
|2.89E+06
|5.35E+06
|8.11E+06
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|4x
|9x
|16x
|25x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|5.82E+05
|1.16E+06
|2.32E+06
|4.58E+06
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|6x
|11x
|22x
|44x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|1.23E+08
|2.40E+08
|4.63E+08
|7.01E+08
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|4x
|9x
|17x
|26x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L40
|2x L40
|4x L40
|8x L40
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|188
|386
|761
|1,542
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|10x
|20x
|40x
|81x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|191
|388
|767
|1,571
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|10x
|20x
|39x
|80x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|240
|481
|970
|1,917
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|12x
|23x
|47x
|92x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|15
|30
|59
|120
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|8x
|16x
|32x
|64x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|15
|31
|62
|123
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|8x
|17x
|34x
|68x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|18
|35
|70
|142
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|9x
|18x
|36x
|73x
Detailed L4 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L4
|2x L4
|4x L4
|8x L4
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|1,146
|2,323
|4,731
|9,554
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|14x
|27x
|56x
|113x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|1,162
|2,366
|4,811
|9,666
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|14x
|28x
|56x
|113x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L4
|2x L4
|4x L4
|8x L4
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|209
|346
|464
|-
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|4x
|7x
|9x
|-
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|57
|94
|133
|162
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|3x
|6x
|10x
|12x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|12
|22
|43
|63
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|3x
|5x
|10x
|15x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x L4
|2x L4
|4x L4
|8x L4
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|63
|128
|260
|520
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|3x
|7x
|14x
|27x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|-
|131
|266
|535
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|-
|7x
|14x
|27x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|86
|172
|347
|701
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|4x
|8x
|17x
|34x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|5
|9
|18
|37
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|2x
|5x
|10x
|20x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|5
|-
|19
|39
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|3x
|-
|11x
|22x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|6
|12
|24
|47
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|3x
|6x
|12x
|24x
Detailed A100 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|182
|364
|726
|1,456
|172
|334
|674
|1,375
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|44x
|88x
|176x
|353x
|42x
|81x
|163x
|333x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|185
|371
|739
|1,483
|176
|340
|686
|1,366
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|45x
|90x
|179x
|360x
|43x
|83x
|167x
|331x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|796
|1,594
|3,175
|6,383
|769
|1,525
|3,054
|6,139
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|38x
|77x
|153x
|308x
|37x
|74x
|147x
|296x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|813
|1,631
|3,257
|6,532
|781
|1,514
|3,132
|6,303
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|39x
|78x
|155x
|312x
|37x
|72x
|149x
|301x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|2,883
|5,761
|11,512
|23,433
|2,819
|5,476
|11,121
|23,249
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|34x
|68x
|136x
|277x
|33x
|65x
|131x
|275x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|2,953
|5,894
|11,693
|23,935
|2,900
|5,787
|11,396
|23,903
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|35x
|69x
|137x
|281x
|34x
|68x
|134x
|281x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|54
|107
|214
|429
|53
|107
|214
|427
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|39x
|78x
|155x
|311x
|39x
|77x
|155x
|310x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|133
|266
|533
|1,066
|134
|268
|536
|1,073
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|13x
|27x
|54x
|108x
|14x
|27x
|54x
|108x
Chroma
Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V 2021.08
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://jeffersonlab.github.io/chroma/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|Chroma
|Total Time (Sec)
|szscl21_24_128
|no
|1,115
|36
|20
|11
|7
|44
|25
|13
|9
|Chroma
|NRF
|szscl21_24_128
|yes
|1x
|32x
|55x
|99x
|163x
|26x
|46x
|84x
|129x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|52
|28
|16
|11
|54
|29
|16
|13
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|12x
|22x
|39x
|55x
|11x
|21x
|39x
|49x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|372
|506
|677
|-
|389
|-
|518
|-
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|7x
|10x
|13x
|-
|8x
|-
|10x
|-
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|108
|174
|254
|290
|108
|122
|183
|-
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|8x
|13x
|19x
|22x
|8x
|9x
|14x
|-
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|24
|44
|80
|128
|24
|39
|65
|92
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|5x
|11x
|20x
|31x
|5x
|9x
|16x
|22x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|472
|898
|3,622
|478
|909
|1,755
|2,706
|GTC
|NRF
|moi#proc.in
|yes
|1x
|14x
|26x
|105x
|14x
|26x
|51x
|79x
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.5_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|ICON [SLAM 191 - 160KM - no radiation]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution without radiation
|no
|2,431
|317
|218
|158
|134
|318
|224
|165
|ICON [SLAM 191 - 160KM - no radiation]
|NRF
|SLAM 191 levels 160 km resolution without radiation
|yes
|1x
|8x
|11x
|15x
|18x
|8x
|11x
|15x
|ICON [QUBICC 160 km resolution]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution with radiation
|no
|2,213
|293
|197
|144
|120
|291
|192
|140
|ICON [QUBICC 160 km resolution]
|NRF
|SLAM 191 levels 160 km resolution with radiation
|yes
|1x
|8x
|11x
|15x
|18x
|8x
|12x
|16x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|LAMMPS [LJ 2.5]
|ATOM-Time Steps/s
|LJ 2.5
|yes
|1.11E+08
|6.00E+08
|1.12E+09
|2.01E+09
|3.66E+09
|6.00E+08
|1.07E+09
|1.81E+09
|-
|LAMMPS [LJ 2.5]
|NRF
|LJ 2.5
|yes
|1x
|6x
|10x
|19x
|34x
|6x
|10x
|17x
|-
|LAMMPS [EAM]
|ATOM-Time Steps/s
|EAM
|yes
|5.33E+07
|2.93E+08
|5.35E+08
|9.23E+08
|1.58E+09
|2.88E+08
|5.04E+08
|8.48E+08
|-
|LAMMPS [EAM]
|NRF
|EAM
|yes
|1x
|6x
|10x
|18x
|31x
|5x
|10x
|17x
|-
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|5.24E+06
|9.68E+06
|1.70E+07
|2.69E+07
|5.28E+06
|9.53E+06
|1.62E+07
|1.97E+07
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|16x
|30x
|52x
|83x
|16x
|29x
|50x
|61x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|2.21E+06
|4.39E+06
|8.73E+06
|1.67E+07
|2.11E+06
|4.09E+06
|8.12E+06
|1.58E+07
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|21x
|42x
|85x
|162x
|20x
|40x
|79x
|153x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|5.28E+08
|9.81E+08
|1.75E+09
|2.99E+09
|5.09E+08
|8.74E+08
|1.40E+09
|-
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|19x
|36x
|64x
|110x
|19x
|32x
|51x
|-
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|2,029
|1,184
|629
|361
|2,088
|1,111
|614
|MILC
|NRF
|Apex Medium
|yes
|1x
|39x
|67x
|125x
|218x
|38x
|71x
|128x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|175
|347
|689
|1,368
|172
|341
|693
|1,372
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|9x
|18x
|36x
|71x
|9x
|18x
|36x
|72x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|178
|357
|714
|1,389
|178
|354
|711
|1,399
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|9x
|18x
|36x
|71x
|9x
|18x
|36x
|71x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|215
|436
|870
|1,731
|214
|424
|851
|1,714
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|10x
|21x
|42x
|83x
|10x
|20x
|41x
|83x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|14
|27
|43
|65
|13
|27
|53
|104
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|7x
|14x
|23x
|35x
|7x
|14x
|29x
|56x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|14
|28
|56
|66
|14
|26
|55
|110
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|8x
|15x
|31x
|36x
|8x
|15x
|31x
|61x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|16
|32
|50
|128
|16
|31
|61
|127
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|8x
|17x
|26x
|66x
|8x
|16x
|31x
|65x
Quantum Espresso
Material Science (Quantum Chemistry)
An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale
VERSION
V7.0 CPU; V7.1 GPU
ACCELERATED FEATURES
- linear algebra (matrix multiply)
- explicit computational kernels
- 3D FFTs
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|Quantum Espressso
|Total CPU Time (Sec)
|AUSURF112-jR
|no
|718
|111
|71
|47
|36
|114
|70
|49
|39
|Quantum Espressso
|NRF
|AUSURF112-jR
|yes
|1x
|7x
|11x
|17x
|22x
|7x
|11x
|16x
|20x
RELION
Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3.1.3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|Relion [Plasmodium Ribosome]
|Total Wall Clock (Sec)
|MB numbers Plasmodium Ribosime on Relion-3.0
|no
|12,742
|2,736
|1,627
|1,439
|2,601
|1,523
|1,383
|Relion [Plasmodium Ribosome]
|NRF
|MB numbers Plasmodium Ribosime on Relion-3.0
|yes
|1x
|5x
|8x
|9x
|5x
|8x
|9x
RTM
Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2021_05
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|RTM [Isotropic Radius 4]
|Mcells/s
|Isotropic Radius 4
|yes
|11,318
|89,561
|178,511
|356,907
|713,883
|89,536
|178,551
|339,823
|713,096
|RTM [Isotropic Radius 4]
|NRF
|Isotropic Radius 4
|yes
|1x
|8x
|16x
|32x
|63x
|8x
|16x
|30x
|63x
|RTM [TTI Radius 8 1-pass]
|Mcells/s
|TTI Radius 8 1-pass
|yes
|3,773
|12,903
|25,764
|51,122
|102,187
|12,901
|25,796
|51,402
|102,510
|RTM [TTI Radius 8 1-pass]
|NRF
|TTI Radius 8 1-pass
|yes
|1x
|3x
|7x
|14x
|27x
|3x
|7x
|14x
|27x
|RTM [TTI RX 2Pass mgpu]
|Mcells/s
|TTI RX 2Pass mgpu
|yes
|3,773
|13,957
|27,664
|54,933
|108,607
|13,743
|27,265
|53,741
|107,880
|RTM [TTI RX 2Pass mgpu]
|NRF
|TTI RX 2Pass mgpu
|yes
|1x
|4x
|7x
|15x
|29x
|4x
|7x
|14x
|29x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A100 SXM4 80GB
|2x A100 SXM4 80GB
|4x A100 SXM4 80GB
|8x A100 SXM4 80GB
|1x A100 PCIe 80GB
|2x A100 PCIe 80GB
|4x A100 PCIe 80GB
|8x A100 PCIe 80GB
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|77
|40
|21
|13
|78
|41
|22
|15
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|19x
|36x
|68x
|116x
|19x
|35x
|67x
|100x
Detailed A30 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|89
|177
|355
|714
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|22x
|43x
|86x
|173x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|91
|181
|362
|727
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|22x
|44x
|88x
|176x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|406
|811
|1,616
|3,241
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|20x
|39x
|78x
|156x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|418
|826
|1,651
|3,311
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|20x
|39x
|79x
|158x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|1,503
|2,989
|5,973
|11,932
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|18x
|35x
|71x
|141x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|1,531
|3,045
|6,077
|12,277
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|18x
|36x
|71x
|144x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|29
|58
|116
|233
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|21x
|42x
|84x
|169x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|99
|198
|395
|790
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|10x
|20x
|40x
|80x
Chroma
Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V 2021.08
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://jeffersonlab.github.io/chroma/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x A30
|4x A30
|8x A30
|Chroma
|Total Time (Sec)
|szscl21_24_128
|no
|1,115
|35
|18
|11
|Chroma
|NRF
|szscl21_24_128
|yes
|1x
|33x
|62x
|103x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|111
|55
|29
|18
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|5x
|11x
|21x
|34x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|201
|287
|378
|-
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|3x
|6x
|7x
|-
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|60
|91
|119
|147
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|3x
|5x
|9x
|11x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|12
|22
|41
|59
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|3x
|5x
|10x
|14x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|285
|531
|1,049
|1,774
|GTC
|NRF
|moi#proc.in
|yes
|1x
|8x
|15x
|31x
|52x
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.5_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|ICON [SLAM 191 - 160KM - no radiation]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution without radiation
|no
|2,431
|571
|354
|233
|206
|ICON [SLAM 191 - 160KM - no radiation]
|NRF
|SLAM 191 levels 160 km resolution without radiation
|yes
|1x
|4x
|7x
|10x
|12x
|ICON [QUBICC 160 km resolution]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution with radiation
|no
|2,213
|502
|302
|193
|164
|ICON [QUBICC 160 km resolution]
|NRF
|SLAM 191 levels 160 km resolution with radiation
|yes
|1x
|4x
|7x
|11x
|13x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|LAMMPS [LJ 2.5]
|ATOM-Time Steps/s
|LJ 2.5
|yes
|1.11E+08
|3.09E+08
|5.94E+08
|1.10E+09
|1.46E+09
|LAMMPS [LJ 2.5]
|NRF
|LJ 2.5
|yes
|1x
|3x
|5x
|10x
|14x
|LAMMPS [EAM]
|ATOM-Time Steps/s
|EAM
|yes
|5.33E+07
|1.37E+08
|2.58E+08
|4.70E+08
|7.30E+08
|LAMMPS [EAM]
|NRF
|EAM
|yes
|1x
|3x
|5x
|9x
|14x
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|2.88E+06
|5.52E+06
|9.98E+06
|1.41E+07
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|9x
|17x
|31x
|44x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|1.11E+06
|2.19E+06
|4.37E+06
|8.54E+06
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|11x
|21x
|42x
|83x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|2.51E+08
|4.37E+08
|7.96E+08
|1.03E+09
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|9x
|16x
|29x
|38x
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|4,710
|2,025
|1,087
|697
|MILC
|NRF
|Apex Medium
|yes
|1x
|17x
|39x
|72x
|113x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|91
|181
|362
|726
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|5x
|9x
|19x
|38x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|94
|187
|371
|745
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|5x
|10x
|19x
|38x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|111
|221
|441
|882
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|5x
|11x
|21x
|42x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|7
|14
|29
|58
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|4x
|8x
|15x
|31x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|7
|15
|30
|59
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|4x
|8x
|16x
|33x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|8
|16
|32
|65
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|4x
|8x
|17x
|34x
RELION
Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3.1.3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|Relion [Plasmodium Ribosome]
|Total Wall Clock (Sec)
|MB numbers Plasmodium Ribosime on Relion-3.0
|no
|12,742
|3,417
|1,861
|1,423
|1,297
|Relion [Plasmodium Ribosome]
|NRF
|MB numbers Plasmodium Ribosime on Relion-3.0
|yes
|1x
|4x
|7x
|9x
|10x
RTM
Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2021_05
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|RTM [Isotropic Radius 4]
|Mcells/s
|Isotropic Radius 4
|yes
|11,318
|44,051
|87,806
|175,438
|350,760
|RTM [Isotropic Radius 4]
|NRF
|Isotropic Radius 4
|yes
|1x
|4x
|8x
|16x
|31x
|RTM [TTI Radius 8 1-pass]
|Mcells/s
|TTI Radius 8 1-pass
|yes
|3,773
|6,757
|13,361
|26,710
|53,281
|RTM [TTI Radius 8 1-pass]
|NRF
|TTI Radius 8 1-pass
|yes
|1x
|2x
|4x
|7x
|14x
|RTM [TTI RX 2Pass mgpu]
|Mcells/s
|TTI RX 2Pass mgpu
|yes
|3,773
|7,026
|13,899
|27,642
|55,140
|RTM [TTI RX 2Pass mgpu]
|NRF
|TTI RX 2Pass mgpu
|yes
|1x
|2x
|4x
|7x
|15x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A30
|2x A30
|4x A30
|8x A30
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|156
|80
|41
|23
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|9x
|18x
|35x
|64x
Detailed A40 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|97
|195
|390
|781
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|23x
|47x
|94x
|189x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|98
|198
|396
|794
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|24x
|48x
|96x
|193x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|486
|984
|1,965
|3,954
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|23x
|48x
|95x
|191x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|497
|1,006
|2,015
|4,022
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|24x
|48x
|96x
|192x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|1,922
|3,889
|7,780
|15,568
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|23x
|46x
|92x
|184x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|1,948
|3,946
|7,906
|16,037
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|23x
|46x
|93x
|188x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|32
|63
|127
|254
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|23x
|46x
|92x
|184x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|116
|232
|463
|926
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|12x
|23x
|47x
|94x
Chroma
Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V 2021.08
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://jeffersonlab.github.io/chroma/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|Chroma
|Total Time (Sec)
|szscl21_24_128
|no
|1,115
|78
|41
|22
|13
|Chroma
|NRF
|szscl21_24_128
|yes
|1x
|15x
|28x
|52x
|89x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|231
|117
|59
|32
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|2x
|5x
|10x
|19x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|340
|379
|505
|-
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|7x
|8x
|10x
|-
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|77
|110
|160
|177
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|5x
|8x
|12x
|13x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|20
|38
|61
|75
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|5x
|9x
|15x
|18x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|305
|563
|1,112
|1,854
|GTC
|NRF
|moi#proc.in
|yes
|1x
|9x
|16x
|32x
|54x
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.5_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|ICON [SLAM 191 - 160KM - no radiation]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution without radiation
|no
|2,431
|741
|420
|262
|223
|ICON [SLAM 191 - 160KM - no radiation]
|NRF
|SLAM 191 levels 160 km resolution without radiation
|yes
|1x
|3x
|6x
|9x
|11x
|ICON [QUBICC 160 km resolution]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution with radiation
|no
|2,213
|747
|415
|253
|192
|ICON [QUBICC 160 km resolution]
|NRF
|SLAM 191 levels 160 km resolution with radiation
|yes
|1x
|3x
|5x
|9x
|12x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|6.85E+05
|1.32E+06
|2.50E+06
|4.28E+06
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|2x
|3x
|7x
|13x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|2.43E+05
|4.87E+05
|9.74E+05
|1.93E+06
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|2x
|5x
|9x
|19x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|5.23E+07
|1.03E+08
|2.02E+08
|3.51E+08
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|2x
|4x
|7x
|13x
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|6,005
|3,094
|1,762
|1,074
|MILC
|NRF
|Apex Medium
|yes
|1x
|13x
|25x
|45x
|73x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|103
|208
|416
|835
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|5x
|11x
|22x
|44x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|109
|220
|440
|882
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|6x
|11x
|22x
|45x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|144
|292
|585
|1,172
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|7x
|14x
|28x
|56x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|8
|15
|30
|61
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|4x
|8x
|16x
|32x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|8
|16
|32
|64
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|4x
|9x
|18x
|35x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|10
|20
|39
|79
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|5x
|10x
|20x
|41x
RELION
Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3.1.3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|Relion [Plasmodium Ribosome]
|Total Wall Clock (Sec)
|MB numbers Plasmodium Ribosime on Relion-3.0
|no
|12,742
|3,207
|1,716
|1,344
|1,323
|Relion [Plasmodium Ribosome]
|NRF
|MB numbers Plasmodium Ribosime on Relion-3.0
|yes
|1x
|4x
|7x
|9x
|10x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x A40
|2x A40
|4x A40
|8x A40
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|203
|103
|53
|29
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|6x
|14x
|27x
|50x
Detailed V100 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|100
|202
|406
|805
|97
|199
|400
|808
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|24x
|49x
|98x
|195x
|24x
|48x
|97x
|196x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|101
|205
|412
|818
|99
|202
|406
|815
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|25x
|50x
|100x
|198x
|24x
|49x
|98x
|198x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|483
|953
|1,915
|3,787
|470
|936
|1,873
|3,784
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|23x
|46x
|92x
|183x
|23x
|45x
|90x
|183x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|496
|978
|1,964
|3,892
|475
|959
|1,926
|3,869
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|24x
|47x
|94x
|186x
|23x
|46x
|92x
|185x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|1,870
|3,293
|6,613
|13,031
|1,789
|3,293
|6,598
|13,149
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|22x
|39x
|78x
|154x
|21x
|39x
|78x
|155x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|1,907
|3,389
|6,795
|13,371
|1,822
|3,387
|6,779
|13,533
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|22x
|40x
|80x
|157x
|21x
|40x
|80x
|159x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|31
|62
|125
|249
|28
|57
|113
|226
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|23x
|45x
|90x
|180x
|20x
|41x
|82x
|164x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|120
|240
|480
|960
|122
|245
|489
|979
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|12x
|24x
|49x
|97x
|12x
|25x
|49x
|99x
Chroma
Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V 2021.08
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://jeffersonlab.github.io/chroma/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|Chroma
|Total Time (Sec)
|szscl21_24_128
|no
|1,115
|165
|31
|17
|10
|142
|28
|15
|13
|Chroma
|NRF
|szscl21_24_128
|yes
|1x
|7x
|37x
|68x
|111x
|8x
|41x
|77x
|85x
FUN3D
Engineering
Suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow
VERSION
13.7 (update 1)
ACCELERATED FEATURES
- Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|FUN3D
|Loop Time (Sec)
|dpw_wbt0_crs-3.6Mn_5
|no
|495
|99
|50
|26
|15
|88
|45
|23
|14
|FUN3D
|NRF
|dpw_wbt0_crs-3.6Mn_5
|yes
|1x
|5x
|12x
|24x
|41x
|6x
|14x
|26x
|44x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|1x RTX6000
|2x RTX6000
|4x RTX6000
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|266
|311
|472
|251
|296
|-
|270
|288
|330
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|5x
|6x
|9x
|5x
|6x
|-
|5x
|6x
|7x
|GROMACS [Cellulose]
|ns/day
|Cellulose
|yes
|19
|71
|103
|156
|60
|83
|-
|73
|98
|-
|GROMACS [Cellulose]
|NRF
|Cellulose
|yes
|1x
|4x
|6x
|12x
|3x
|5x
|-
|4x
|6x
|-
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|16
|30
|53
|13
|25
|32
|16
|29
|38
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|3x
|7x
|13x
|3x
|6x
|7x
|3x
|7x
|9x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|271
|510
|1,011
|1,796
|298
|552
|1,081
|1,945
|GTC
|NRF
|moi#proc.in
|yes
|1x
|8x
|15x
|29x
|52x
|9x
|16x
|31x
|57x
ICON
Weather and Climate
A global unified atmosphere model for numerical weather prediction and climate modeling research
VERSION
2.6.5_RC
ACCELERATED FEATURES
- Full model of dynamics and physics
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|ICON [SLAM 191 - 160KM - no radiation]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution without radiation
|no
|2,431
|591
|353
|223
|167
|819
|578
|248
|ICON [SLAM 191 - 160KM - no radiation]
|NRF
|SLAM 191 levels 160 km resolution without radiation
|yes
|1x
|4x
|7x
|11x
|15x
|3x
|4x
|10x
|ICON [QUBICC 160 km resolution]
|Integrate_nh (sec)
|SLAM 191 levels 160 km resolution with radiation
|no
|2,213
|514
|304
|192
|143
|697
|438
|215
|ICON [QUBICC 160 km resolution]
|NRF
|SLAM 191 levels 160 km resolution with radiation
|yes
|1x
|4x
|7x
|12x
|16x
|3x
|5x
|10x
LAMMPS
Molecular Dynamics
Classical molecular dynamics package
VERSION
stable_23Jun2022_update1
ACCELERATED FEATURES
- Lennard-Jones, Gay-Berne, Tersoff, many more potentials
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://lammps.sandia.gov/index.html
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|LAMMPS [LJ 2.5]
|ATOM-Time Steps/s
|LJ 2.5
|yes
|1.11E+08
|3.41E+08
|6.34E+08
|1.24E+09
|2.24E+09
|3.45E+08
|6.23E+08
|1.15E+09
|1.87E+09
|LAMMPS [LJ 2.5]
|NRF
|LJ 2.5
|yes
|1x
|3x
|6x
|11x
|21x
|3x
|6x
|11x
|17x
|LAMMPS [EAM]
|ATOM-Time Steps/s
|EAM
|yes
|5.33E+07
|1.23E+08
|2.67E+08
|5.39E+08
|9.74E+08
|1.25E+08
|2.66E+08
|5.15E+08
|8.23E+08
|LAMMPS [EAM]
|NRF
|EAM
|yes
|1x
|2x
|5x
|11x
|19x
|2x
|5x
|10x
|16x
|LAMMPS [ReaxFF/C]
|ATOM-Time Steps/s
|ReaxFF/C
|yes
|4.45E+05
|3.23E+06
|6.09E+06
|1.14E+07
|1.94E+07
|3.44E+06
|6.42E+06
|1.19E+07
|1.91E+07
|LAMMPS [ReaxFF/C]
|NRF
|ReaxFF/C
|yes
|1x
|10x
|19x
|35x
|60x
|11x
|20x
|37x
|59x
|LAMMPS [SNAP]
|ATOM-Time Steps/s
|SNAP
|yes
|1.08E+05
|1.42E+06
|2.86E+06
|5.69E+06
|1.14E+07
|1.40E+06
|2.80E+06
|5.58E+06
|1.12E+07
|LAMMPS [SNAP]
|NRF
|SNAP
|yes
|1x
|14x
|28x
|55x
|111x
|14x
|27x
|54x
|108x
|LAMMPS [Tersoff]
|ATOM-Time Steps/s
|Tersoff
|yes
|2.77E+07
|2.71E+08
|4.95E+08
|9.62E+08
|1.80E+09
|2.81E+08
|5.18E+08
|9.83E+08
|1.56E+09
|LAMMPS [Tersoff]
|NRF
|Tersoff
|yes
|1x
|10x
|18x
|35x
|66x
|10x
|19x
|36x
|57x
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|4,737
|2,347
|1,229
|689
|3,864
|2,020
|1,103
|1,068
|MILC
|NRF
|Apex Medium
|yes
|1x
|17x
|34x
|64x
|114x
|20x
|39x
|71x
|74x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x RTX6000
|2x RTX6000
|4x RTX6000
|8x RTX6000
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|111
|223
|449
|890
|66
|133
|266
|532
|114
|227
|455
|905
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|6x
|12x
|23x
|46x
|3x
|7x
|14x
|28x
|6x
|12x
|24x
|47x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|116
|235
|470
|935
|70
|141
|282
|562
|119
|236
|473
|943
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|6x
|12x
|24x
|48x
|4x
|7x
|14x
|29x
|6x
|12x
|24x
|48x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|142
|285
|571
|1,148
|89
|179
|358
|717
|144
|286
|573
|1,145
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|7x
|14x
|28x
|55x
|4x
|9x
|17x
|35x
|7x
|14x
|28x
|55x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|8
|17
|34
|68
|5
|10
|21
|41
|9
|18
|35
|70
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|5x
|9x
|18x
|36x
|3x
|6x
|11x
|22x
|5x
|9x
|19x
|38x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|9
|18
|36
|71
|5
|11
|22
|44
|9
|18
|36
|72
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|5x
|10x
|20x
|39x
|3x
|6x
|12x
|24x
|5x
|10x
|20x
|40x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|10
|20
|40
|79
|6
|13
|26
|51
|10
|20
|40
|80
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|5x
|10x
|20x
|41x
|3x
|7x
|13x
|26x
|5x
|10x
|21x
|41x
NV-WRFg
Numerical Weather Prediction
Numerical weather prediction system designed for both atmospheric research and operational forecasting applications
VERSION
3.8.1 NCAR (CPU) / 3.8.1 WRFg 10_28 (GPU)
ACCELERATED FEATURES
- Dynamics modules
- Several Physics modules
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|4x V100 SXM2 32GB
|4x V100S PCIe 32GB
|NV-WRFg
|Seconds / Timestamps
|Conus_2.5k_JA
|no
|6
|0.62
|0.68
|NV-WRFg
|NRF
|Conus_2.5k_JA
|yes
|1x
|10x
|9x
Quantum Espresso
Material Science (Quantum Chemistry)
An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale
VERSION
V7.0 CPU; V7.1 GPU
ACCELERATED FEATURES
- linear algebra (matrix multiply)
- explicit computational kernels
- 3D FFTs
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|Quantum Espressso
|Total CPU Time (Sec)
|AUSURF112-jR
|no
|718
|270
|133
|82
|58
|260
|130
|88
|69
|Quantum Espressso
|NRF
|AUSURF112-jR
|yes
|1x
|3x
|6x
|10x
|14x
|3x
|6x
|9x
|12x
RELION
Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3.1.3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|Relion [Plasmodium Ribosome]
|Total Wall Clock (Sec)
|MB numbers Plasmodium Ribosime on Relion-3.0
|no
|12,742
|3,417
|2,095
|3,443
|2,083
|Relion [Plasmodium Ribosome]
|NRF
|MB numbers Plasmodium Ribosime on Relion-3.0
|yes
|1x
|4x
|6x
|4x
|6x
RTM
Geoscience
Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration
VERSION
nvidia_2021_05
ACCELERATED FEATURES
- Batch algorithm
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|RTM [Isotropic Radius 4]
|Mcells/s
|Isotropic Radius 4
|yes
|11,318
|38,091
|75,978
|152,022
|303,986
|46,037
|91,790
|183,515
|367,252
|RTM [Isotropic Radius 4]
|NRF
|Isotropic Radius 4
|yes
|1x
|3x
|7x
|13x
|27x
|4x
|8x
|16x
|32x
|RTM [TTI Radius 8 1-pass]
|Mcells/s
|TTI Radius 8 1-pass
|yes
|3,773
|8,538
|16,885
|33,070
|65,732
|9,276
|18,304
|36,393
|72,591
|RTM [TTI Radius 8 1-pass]
|NRF
|TTI Radius 8 1-pass
|yes
|1x
|2x
|4x
|9x
|17x
|2x
|5x
|10x
|19x
|RTM [TTI RX 2Pass mgpu]
|Mcells/s
|TTI RX 2Pass mgpu
|yes
|3,773
|7,165
|14,203
|28,177
|56,235
|8,491
|16,871
|33,547
|66,849
|RTM [TTI RX 2Pass mgpu]
|NRF
|TTI RX 2Pass mgpu
|yes
|1x
|2x
|4x
|7x
|15x
|2x
|4x
|9x
|18x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|1x V100 SXM2 32GB
|2x V100 SXM2 32GB
|4x V100 SXM2 32GB
|8x V100 SXM2 32GB
|1x V100S PCIe 32GB
|2x V100S PCIe 32GB
|4x V100S PCIe 32GB
|8x V100S PCIe 32GB
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|159
|82
|44
|25
|131
|68
|37
|23
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|9x
|18x
|33x
|58x
|11x
|21x
|39x
|63x
Detailed T4 application performance data is located below in alphabetical order.
AMBER
Molecular Dynamics
Suite of programs to simulate molecular dynamics on biomolecule
VERSION
22.0-AT_22.3
ACCELERATED FEATURES
- PMEMD Explicit Solvent and GB Implicit Solvent
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|AMBER [PME-Cellulose_NPT_4fs]
|ns/day
|DC-Cellulose_NPT
|yes
|4.13
|61
|121
|245
|AMBER [PME-Cellulose_NPT_4fs]
|NRF
|DC-Cellulose_NPT
|yes
|1x
|15x
|29x
|59x
|AMBER [PME-Cellulose_NVE_4fs]
|ns/day
|DC-Cellulose_NVE
|yes
|4.12
|62
|123
|248
|AMBER [PME-Cellulose_NVE_4fs]
|NRF
|DC-Cellulose_NVE
|yes
|1x
|15x
|30x
|60x
|AMBER [PME-FactorIX_NPT_4fs]
|ns/day
|DC-FactorIX_NPT
|yes
|20.71
|285
|603
|1,213
|AMBER [PME-FactorIX_NPT_4fs]
|NRF
|DC-FactorIX_NPT
|yes
|1x
|14x
|29x
|59x
|AMBER [PME-FactorIX_NVE_4fs]
|ns/day
|DC-FactorIX_NVE
|yes
|20.95
|292
|616
|1,202
|AMBER [PME-FactorIX_NVE_4fs]
|NRF
|DC-FactorIX_NVE
|yes
|1x
|14x
|29x
|57x
|AMBER [PME-JAC_NPT_4fs]
|ns/day
|DC-JAC_NPT
|yes
|84.61
|1,245
|2,365
|4,491
|AMBER [PME-JAC_NPT_4fs]
|NRF
|DC-JAC_NPT
|yes
|1x
|15x
|28x
|53x
|AMBER [PME-JAC_NVE_4fs]
|ns/day
|DC-JAC_NVE
|yes
|85.16
|1,259
|2,504
|4,979
|AMBER [PME-JAC_NVE_4fs]
|NRF
|DC-JAC_NVE
|yes
|1x
|15x
|29x
|58x
|AMBER [PME-STMV_NPT_4fs]
|ns/day
|DC-STMV_NPT
|yes
|1.38
|21
|42
|83
|AMBER [PME-STMV_NPT_4fs]
|NRF
|DC-STMV_NPT
|yes
|1x
|15x
|30x
|60x
|AMBER [FEP-GTI_Complex 1fs]
|ns/day
|FEP-GTI_Complex
|yes
|9.89
|107
|213
|427
|AMBER [FEP-GTI_Complex 1fs]
|NRF
|FEP-GTI_Complex
|yes
|1x
|11x
|22x
|43x
Chroma
Physics
Lattice Quantum Chromodynamics (LQCD)
VERSION
V 2021.08
ACCELERATED FEATURES
- Wilson-clover fermions, Krylov solvers, Domain-decomposition
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
http://jeffersonlab.github.io/chroma/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|Chroma
|Total Time (Sec)
|szscl21_24_128
|no
|1,115
|117
|40
|26
|Chroma
|NRF
|szscl21_24_128
|yes
|1x
|10x
|28x
|44x
GROMACS
Molecular Dynamics
Simulation of biochemical molecules with complicated bond interactions
VERSION
2022.3
ACCELERATED FEATURES
- Implicit (5x), Explicit (2x) Solvent
SCALABILITY
Multi-GPU, Single Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|GROMACS [ADH Dodec]
|ns/day
|ADH Dodec
|yes
|67
|163
|238
|GROMACS [ADH Dodec]
|NRF
|ADH Dodec
|yes
|1x
|3x
|5x
|GROMACS [STMV]
|ns/day
|STMV
|yes
|4
|-
|20
|GROMACS [STMV]
|NRF
|STMV
|yes
|1x
|-
|5x
GTC
Physics
GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas
VERSION
V 4.5 Updated
ACCELERATED FEATURES
- Push, shift, and collision
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|GTC
|Mpush/Sec
|moi#proc.in
|yes
|35
|236
|466
|893
|GTC
|NRF
|moi#proc.in
|yes
|1x
|7x
|14x
|26x
MILC
Physics
Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons
VERSION
feature/gauge-action-quda_16a2d47119
ACCELERATED FEATURES
- Staggered fermions, Krylov solvers, Gauge-link fattening
SCALABILITY
Multi-GPU and Multi-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|MILC
|Total Time (Sec)
|Apex Medium
|no
|71,595
|7,563
|3,898
|2,135
|MILC
|NRF
|Apex Medium
|yes
|1x
|10x
|20x
|37x
NAMD
Molecular Dynamics
Designed for high-performance simulation of large molecular systems
VERSION
GPU, AMD CPU V 3.0a13 ; Intel CPU V 2.15a AVX512
ACCELERATED FEATURES
- Full electrostatics with PME and most simulation features
SCALABILITY
Up to 100M atom capable, multi-GPU, single node
MORE INFORMATION
http://www.ks.uiuc.edu/Research/namd/
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|NAMD [apoa1_npt_cuda]
|Ave ns/day
|apoa1_npt_cuda
|yes
|19.15
|57
|113
|229
|NAMD [apoa1_npt_cuda]
|NRF
|apoa1_npt_cuda
|yes
|1x
|3x
|6x
|12x
|NAMD [apoa1_nptsr_cuda]
|Ave ns/day
|apoa1_nptsr_cuda
|yes
|19.59
|59
|117
|239
|NAMD [apoa1_nptsr_cuda]
|NRF
|apoa1_nptsr_cuda
|yes
|1x
|3x
|6x
|12x
|NAMD [apoa1_nve_cuda]
|Ave ns/day
|apoa1_nve_cuda
|yes
|20.75
|75
|149
|303
|NAMD [apoa1_nve_cuda]
|NRF
|apoa1_nve_cuda
|yes
|1x
|4x
|7x
|15x
|NAMD [stmv_npt_cuda]
|Ave ns/day
|stmv_npt_cuda
|yes
|1.87
|-
|9
|17
|NAMD [stmv_npt_cuda]
|NRF
|stmv_npt_cuda
|yes
|1x
|-
|5x
|9x
|NAMD [stmv_nptsr_cuda]
|Ave ns/day
|stmv_nptsr_cuda
|yes
|1.81
|5
|9
|17
|NAMD [stmv_nptsr_cuda]
|NRF
|stmv_nptsr_cuda
|yes
|1x
|3x
|5x
|10x
|NAMD [stmv_nve_cuda]
|Ave ns/day
|stmv_nve_cuda
|yes
|1.94
|-
|10
|20
|NAMD [stmv_nve_cuda]
|NRF
|stmv_nve_cuda
|yes
|1x
|-
|5x
|10x
RELION
Microscopy
Stand-alone computer program that employs an empirical Bayesianapproach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
VERSION
3.1.3
ACCELERATED FEATURES
- Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation
SCALABILITY
Multi-GPU and Single Node
MORE INFORMATION
https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|Relion [Plasmodium Ribosome]
|Total Wall Clock (Sec)
|MB numbers Plasmodium Ribosime on Relion-3.0
|no
|12,742
|3,586
|2,549
|Relion [Plasmodium Ribosome]
|NRF
|MB numbers Plasmodium Ribosime on Relion-3.0
|yes
|1x
|4x
|5x
SPECFEM3D
Geoscience
Simulates Seismic wave propagation
VERSION
devel_fef2ace9
ACCELERATED FEATURES
- OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library
SCALABILITY
Multi-GPU and Single-Node
MORE INFORMATION
|Application
|Metric
|Test Modules
|Bigger is better
|Dual Cascade Lake 6240 (CPU-Only)
|2x T4 PCIe
|4x T4 PCIe
|8x T4 PCIe
|SPECFEM3D
|Total Time (Sec)
|four_material_simple_model
|no
|1,268
|239
|122
|64
|SPECFEM3D
|NRF
|four_material_simple_model
|yes
|1x
|5x
|12x
|23x