GTC Silicon Valley-2019 ID:S9192:CUDA Implementation of Modern Preconditioning Techniques for Iterative Solvers of Linear Systems
Massimo Bernaschi(National Research Council of Italy)
Learn how to implement state-of-the-art preconditioners for iterative solvers of large-scale linear systems in CUDA. Previously most preconditioners were set up on CPUs because this task was not considered suitable for fine-grain parallelization. We'll show how it's possible to implement efficient CUDA kernels for techniques like the adaptive factorized sparse approximate inverse by adopting an approach that dramatically reduces the amount of memory required to run in parallel. We'll describe how our GPU-only preconditioners and solvers can be used to solve real-world problems in science and engineering. We'll provide single and multi-GPU implementations. Our method makes it possible to obtain about an order-of-magnitude speedup on high-end multi-core CPUs like the Intel Xeon Platinum 8176.