Simulation / Modeling / Design

Microsoft and TempoQuest Accelerate Wind Energy Forecasts with AceCast

Accurate weather modeling is essential for companies to properly forecast renewable energy production and plan for natural disasters. Ineffective and non-forecasted weather cost an estimated $714 billion in 2022 alone. To avoid this, companies need faster, cheaper, and more accurate weather models.

In a recent GTC session, Microsoft, and TempoQuest detailed their work with NVIDIA to address this energy and climate issue. TempoQuest, a member of the NVIDIA Inception Program, enables hyper-local low-latency weather and environmental forecasts. Our team is multi-disciplinary, covering atmospheric science, meteorology, HPC, AI, ML, engineering, and more. We have been a leading adopter bringing GPUs to the environmental sector, including:

  • First to port WRF to GPUs
  • First to create higher resolution forecasts faster and cheaper than CPU-based forecasts
  • First to develop a GPU software-as-a-service weather forecast system

In this blog post, we’ll share how TempoQuest leveraged accelerated computing from NVIDIA on Microsoft Azure to move the traditional Weather Research and Forecasting (WRF) software to GPUs, deliver resolutions less than one kilometer and time resolutions of one minute to one hour, and enable faster predictions of power generated by renewable wind and solar resources.

Utility Challenges to Integrate Renewables

It’s challenging for utilities to manage their grid with renewable energy, primarily wind and solar. These energy sources vary depending on environmental factors, such as cloud coverage and wind speed. If renewable energy generation is insufficient to meet demand, utilities have to use “spinning reserves” — carbon-based electricity produced by generators — to make up the shortfall. Fast, accurate, and cost-effective weather forecasting is needed to better predict renewable energy generation.

Utilities manage the flow of electricity from generating stations through transmission and distribution lines to end customers.
Figure 1. Diagram of standard power grid infrastructure including generation, transmission, and distribution.

There are three key functions that the energy industry has to accomplish: power generation, transmission, and distribution. Power generation, currently produced using mainly carbon-based fuels, is transitioning to renewables including wind and solar to make progress towards net-zero emissions. Next is transmission, where generated electricity runs through a step up transformer and is carried through high voltage lines. At the far grid edge, electricity is “stepped down” through transformers and substations to deliver power (240 V / 120 V) to consumers in homes and businesses.

Adding more renewables to the grid requires utilities to not only integrate new generation sites, but also build more high voltage transmission lines and towers. This brings additional complexity and costs, both capital and operating expenses, to maintain grids. High-resolution GPU-accelerated WRF can help by reducing reliance on carbon-based power and optimizing the use of renewable energy sources.

Accelerating WRF with GPUs

AceCAST, which stands for “Accelerated Forecast”, is the result of running a regional model called WRF or “Weather Research and Forecasting” that is widely-used in 160 countries by 50,000 users. We ported WRF to run on x86 systems with NVIDIA GPUs using proprietary OpenACC and CUDA and scaled on multi-GPU and multi-node systems. AceCAST supports all major WRF dynamics, physics schemes, and namelist options, and is a drop-in replacement for existing WRF configurations.

There are multiple benefits of AceCAST, including faster time to solution, higher resolution and greater accuracy, greater awareness of localized weather phenomena, and reduced computational costs.

Our testing shows that GPUs are faster, higher resolution, and more cost-effective than CPUs for weather forecasting and predicting renewable power. This accelerated solution is important to reduce carbon power generation, enhance grid reliability and management, and lower energy costs for consumers.

AceCAST Validation and Performance-Cost Analysis

To validate our benchmarking results, we first ensured CPU WRF to GPU WRF differences were within an acceptable tolerance. Then, we tested model performance across several temporal and spatial forecast ranges. Finally, we validated thousands of test cases to ensure AceCAST produced the same results as the CPU WRF. Running performance tests on Microsoft Azure revealed large differences in both performance and cost.

CPU-based WRF – Standard HB120rs_v3 VMs (HBv3):

  • 120 AMD EPYC™ 7V73X-series (Milan-X) CPU cores
  • 450 GB RAM (350 GB/sec memory bandwidth)
  • 200 Gb/sec HDR InfiniBand
  • 2 x 1 TB NVME ssd disks
  • NCAR WRF 4.2.2
  • Uses Parallel net-CDF
  • Compiled with Intel Compilers and MPI

GPU-accelerated WRF – Standard_ND96amsr_A100_v4 (NDmv4):

  • 8 NVIDIA A100 Tensor Core GPUs (80GB)
  • NVLink 3.0 (200 Gb/s HDR InfiniBand)
  • 96 AMD EPYC™ 7V12-series (Rome) CPU cores
  • 8 x 1 TB NVME ssd disks
  • AceCAST 2.1
  • Proprietary implementation using OpenACC and CUDA
  • Scales on multi-node and multi-GPU using MPI

Azure Managed Lustre File System

  • 40TiB Storage Azure Managed Capacity
  • 10000 MB/s max throughput
TempoQuest AceCAST, powered by NVIDIA A100 Tensor Core GPUs, provide ~9x acceleration at the same cost as CPU-based weather research forecasting (WRF) models.
Figure 2. Performance-to-cost analysis of TempoQuest AceCAST compared to CPU-based WRF.

Our results showed that the GPU-accelerated WRF (AceCAST) on one node achieved ~9x acceleration compared to CPU-based WRF on one node, while 18 CPU nodes are necessary to achieve a similar performance to one GPU node. These results are critical as faster, lower-cost weather forecasting enables utilities to more accurately predict renewable energy generation, deliver reliable power, and avoid excessive outages.

Further testing on AceCAST 3.0.1 showed further performance gains. We used a nested domain with the outer domain composed of 5 million grid points (430x331x38v) and 15 km grid spacing, while the inner domain was composed of 80 million grid points (1551x1361x38v) and 3 km grid spacing.

TempoQuest AceCAST, powered by NVIDIA A100 Tensor Core GPUs, runs 7% faster at 75% lower cost than CPU-based WRF models.
Figure 3. Performance-to-cost chart of TempoQuest AceCAST with optimal configuration to run a single job.

The results showed AceCAST runs 16.8x faster than WRF for inner domain compute and communication times on 1xNDmA100V4 (8 GPUs) compared to 1xHBv3 (64 CPUs). For a single job, the optimal configuration was found to be WRF on 16 HBv3 (CPU) VMs and AceCAST on 1 NDmA100 (GPU) VM with 8 GPUs. In this scenario, AceCAST runs 7% faster at 75% lower cost than CPU-based WRF.

Renewable Power Prediction

Let’s close the loop with AceCAST applied to renewable power prediction. Utilities in the United States have specifications on all 70,000+ wind turbines and locations of every wind and solar node. By utilizing proprietary weather-to-power algorithms, AceCAST provides higher forecast resolution to enable accurate power prediction (MW) at specific renewable energy generation sites each day on an hourly basis.

Decarbonizing the Grid

As generation assets transition from centralized carbon-based technology to clean distributed energy resources, the grid is challenged to manage supply and demand in real-time. Predicting renewable asset performance enables utilities to enhance grid reliability and resiliency. The collaboration between NVIDIA, Microsoft, and TempoQuest is helping address this major societal and global challenge.

Using GPU-accelerated WRF, AceCAST, TempoQuest is accelerating power prediction of wind and solar renewable resources at a lower cost. This helps optimize load and generation balance, reduce operating costs for utilities, manage fluctuations in renewable energy output, and produce more reliable forecasts to reduce reliance on carbon-based power reserves.

To learn more:

To dive deeper into accelerated computing, see the GPU-Accelerated Libraries forum.

Discuss (1)