For over a decade, traditional industrial process modeling and simulation approaches have struggled to fully leverage multicore CPUs or acceleration devices to run simulation and optimization calculations in parallel. Multicore linear solvers used in process modeling and simulation have not achieved expected improvements, and in certain cases have underperformed optimized single-core solvers.

NVIDIA cuDSS is an optimized, first-generation GPU-accelerated direct sparse solver library for solving linear systems with very sparse matrices. It uses CUDA to parallelize matrix factorizations and solutions on GPUs. This solver was integrated into the Honeywell UniSim Design equation-oriented platform called UniSim EO.

Honeywell has a set of nonsymmetric matrices that were generated from UniSim EO process model applications. The set includes models of upstream and midstream oil and gas, refining, petrochemical, and chemical process units. Honeywell’s in-house sparse linear equation solver DOTAXB was used as the basis for comparison with cuDSS. The computing platform was a Microsoft Azure instance powered by an NVIDIA A100 Tensor Core GPU.

In this post, we’ll share how the NVIDIA cuDSS solver showed performance improvements of 2x to 78x over Honeywell’s existing sparse linear equation solver, which in the past has outperformed other solvers it was tested against, including multicore solvers.

## Industrial process simulation using Honeywell UniSim Design

Honeywell Connected Industrial (HCI) is a leader in the digital transformation of industries for a sustainable, safer, and smarter future. The division offers software solutions for industrial process applications. Honeywell made significant investments in their equation-oriented solution architecture that can use hardware platforms powered by multicore CPUs and GPUs to speed up linear system solutions.

Their process simulation product, UniSim Design, is a first-principles-based software package for the solution of general process flowsheet modeling problems. It is used in a variety of Honeywell applications, including process plant design, simulation and optimization of process operations, process performance monitoring, virtual sensors, data reconciliation, emissions management, process digital twins, and operator training simulators (Figure 1). It is also used for advanced process control application support.

Engineers using simulation for design, operational decision support, and advanced control support leverage UniSim Design to reduce their carbon footprint through new sustainability unit operations models and emissions calculation functionality.

UniSim Design contains an equation-oriented modeling capability called UniSim EO. This system enables efficient and reliable solutions of very large flowsheet models using an approach that simultaneously solves all the process model equations. However, the solution times for large models can limit productivity for engineers doing many design or operational case studies. It can also limit the scope size for process digital twin solutions, which often need to execute frequently and within a strict cycle time.

UniSim EO uses a large-scale constrained nonlinear equation solver that requires the solution of a system of linear equations at each iteration. The matrix for these systems is sparse with over 1 million nonzeros, and in many cases the factorization and solution can take up to 90% of the total solution time. Thus, speeding up the matrix solution time can give a significant performance improvement to models of this size.

## Testing NVIDIA cuDSS for industrial process simulation

Evaluating the linear system solution times on large sparse matrices generated from UniSim EO applications gives an accurate indication of expected performance improvements when using the NVIDIA software and hardware combination within a nonlinear equation or optimization solver. Honeywell has a suite of industrial test matrices, which are the Jacobian matrices from the linearized process model equations of the following form:

Two linear solvers were compared: NVIDIA cuDSS and Honeywell DOTAXB solver. DOTAXB is a single-core direct sparse matrix factorization and solution package that includes sparse LU factorization, block triangular factorization, and simultaneous analyze and factorize (numerical pivoting) phases.

For over three decades, DOTAXB has been the de facto standard used in the NOVA Optimization Solver, which is used for design optimization, real-time optimization, and nonlinear model predictive control applications in the process industries. It has also been used in the NOVA Nonlinear Equation Solver for the last decade for process simulation solutions. DOTAXB is highly optimized for process flowsheet modeling structures and solutions. To date, no multicore sparse linear algebra package has out-performed the DOTAXB solver on test cases.

Honeywell has a test system that reads the matrix data and performs the analyze, factorize, and solve steps to obtain a solution to the linear system of equations. The test system then generates performance data on the solution times for each step and the accuracy of the solution.

Details about the hardware and software basis of the testing environment are listed below:

- Windows 10 OS
- CUDA Toolkit 12.3
- NVIDIA DLL: cudss.dll v0.1.0
- UniSim DLL: dotaxb.dll R500
- GPU: NVIDIA A100 80 GB PCIe Tensor Core GPU
- CPU: AMD EPYC 7V13 64-Core Processor (2.44 GHz)
- Microsoft Azure Server
- All test matrices preprocessed with scaling and numerical zeros below a drop tolerance removed
- cuDSS matrix reordering option
`CUDSS_ALG_1`

and pivoting type`CUDSS_PIVOT_ROW`

, which worked best for our matrices

## Solution speedups for cold start and hot start

Cold start is a solution performed without previous factorization information. The total solution time is the sum of the Analyze, Factorize, and Solve times. Hot start is a solution using previous factorization information to reduce computation times. The total solution time for this mode is the sum of Refactorize and Solve times. The speedup metric is the ratio of the DOTAXB total solution time to the cuDSS total solution time.

Test results show that cuDSS is faster than DOTAXB in all test cases with equivalent accuracy (Tables 1 and 2).

Sparse Solver Performance Speedup: Cold Start | |||||

Matrix | n | nnz(A) | cuDSS (secs) | DOTAXB (secs) | Speedup (DOTAXB/CUDSS) |

lgcmpdis | 1,136,993 | 76,789,656 | 18.29 | 1421.68 | 77.7 |

bsreoncp | 809,340 | 10,759,259 | 6.77 | 15.92 | 2.4 |

catnaput | 91,896 | 558,409 | 0.25 | 0.25 | 1.0 |

cndlckut | 54,789 | 468,490 | 0.04 | 0.14 | 3.1 |

cpsbtfrc | 360,069 | 4,057,273 | 1.98 | 2.87 | 1.4 |

dlhytrut | 194,406 | 586,646 | 0.21 | 0.73 | 3.6 |

krhytrut | 159,260 | 443,687 | 0.10 | 0.31 | 3.2 |

nphytrut | 158,361 | 423,116 | 0.09 | 0.32 | 3.5 |

osothcut | 424,072 | 1,442,526 | 0.87 | 2.83 | 3.3 |

otdlckut | 55,241 | 504,224 | 0.05 | 0.15 | 3.2 |

hpdtcudm | 265,616 | 3,355,299 | 1.95 | 5.27 | 2.7 |

tsrchcut | 605,693 | 1,987,192 | 1.32 | 4.55 | 3.5 |

*Table 1. Cold start sparse solver performance results*Excluding the largest case, the average cuDSS speedup was 3x for cold start and 19x for hot start. The performance speedup on the largest test case was 78x for cold start and 200x for hot start, indicating superior scalability and efficiency on significantly larger matrices. This latter result is particularly important and quite a breakthrough in solver technology that will enable the practical solution and use of even larger flowsheet models in the future.

Sparse Solver Performance Speedup: Hot Start | |||||

Matrix | n | nnz(A) | cuDSS (secs) | DOTAXB (secs) | Speedup (DOTAXB/CUDSS) |

lgcmpdis | 1,136,993 | 76,789,656 | 5.19 | 1421.68 | 274.0 |

bsreoncp | 809,340 | 10,759,259 | 0.57 | 15.92 | 28.0 |

catnaput | 91,896 | 558,409 | 0.05 | 0.25 | 5.2 |

cndlckut | 54,789 | 468,490 | 0.01 | 0.14 | 13.5 |

cpsbtfrc | 360,069 | 4,057,273 | 0.16 | 2.87 | 18.0 |

dlhytrut | 194,406 | 586,646 | 0.04 | 0.73 | 19.5 |

krhytrut | 159,260 | 443,687 | 0.03 | 0.31 | 12.2 |

nphytrut | 158,361 | 423,116 | 0.03 | 0.32 | 12.7 |

osothcut | 424,072 | 1,442,526 | 0.12 | 2.83 | 23.1 |

otdlckut | 55,241 | 504,224 | 0.01 | 0.15 | 14.1 |

hpdtcudm | 265,616 | 3,355,299 | 0.16 | 5.27 | 33.5 |

tsrchcut | 605,693 | 1,987,192 | 0.17 | 4.55 | 27.3 |

*Table 2. Hot start sparse solver performance results*

## Benefits of faster process model solutions

The performance improvements delivered by cuDSS enable larger-scope first principles models to be solved in a reasonable amount of time for reliable process digital twin execution. cuDSS also eliminates the need for including surrogate or reduced model development in an application workflow to achieve faster solutions, which reduces maintenance concerns.

For design engineers, the cuDSS performance boost helps improve engineering productivity by enabling the completion of complex simulations faster. As a result, more design scenarios can be considered in the same time frame.

For the refactorization mode, hot start without the Analyze phase delivered incredible speedups on the test matrices, but this feature should be used with caution for nonlinear equation solving in the process modeling domain. Depending on the initial condition, the pivot sequence may need to change for each iteration, and forgoing a full factorization phase could lead to an inaccurate linear system solution.

However, with the right solver metric to enable use of the hot start at the appropriate time, the refactorization option could be used safely and could provide even more performance improvements for a future version of the Honeywell solver.

## Conclusion

Honeywell is working to complete the productization of NVIDIA cuDSS as a linear solver option within the context of nonlinear equation solving and optimization in UniSim Design. This includes optimizing solver configuration for the process simulation domain and assessing improvements with different NVIDIA GPUs and new and emerging NVIDIA hardware.

The incorporation of cuDSS into Honeywell’s UniSim Design product will increase productivity for its process simulation user base, enable large process digital twins for operational decision support to solve reliably within their execution cycle, and create more time to explore better process designs. This will enable a reduction in capital costs, operating costs, and carbon footprint for new industrial facilities.

Download NVIDIA cuDSS and start exploring.

Want to learn more? Sign up for the NVIDIA GTC 2024 session with Honeywell, GPU-Accelerating Process Simulation Performance Using NVIDIA’s cuDSS Sparse Linear Systems Solver.