GTC-DC 2019: Accelerated Simulation of Air Pollution Using NVIDIA RAPIDS

Christoph A. Keller, NASA GMAO / USRA
gtc-dc 2019
We’ll discuss the potential to accelerate the numerical simulation of air quality using machine learning. Air pollution is a major environmental risk factor and one of the common causes of premature death globally. However, addressing the issue is difficult due to the cost of numerical air quality models, which are computationally expensive due to the complexity of the chemical processes responsible for air pollution’s formation. We’ll discuss our use of Dask-cuDF and Dask-XGBoost on the NVIDIA RAPIDS platform to generate gradient-boosted tree models that can simulate the formation of air pollution with decent accuracy. We’ll explain how recent advances in Dask-XGBoost enabled us to increase training size, and how this is necessary for improving model skill. The boosted tree models, trained on NASA Center for Climate Simulation’s Advanced Data Analytics Platform, can be coupled with a full earth system model such as NASA’s Goddard Earth Observing System Model. This will provide a detailed global simulation of air quality at a much lower computational cost.