Facing the Edge Data Challenge with HPC + AI

This post was updated April 2023.

Scientific instruments are being upgraded to deliver 10–100x more sensitivity and resolution over the next decade, requiring a corresponding scale-up for storage and processing. The data produced from these enhanced instruments will reach limits that Moore’s law cannot adequately address and it will challenge traditional operating models solely based on HPC in data centers.

The era in which edge computing is reliant on AI with high-performance computing (HPC) to keep up with these enhanced capabilities is here.

This sentiment was echoed at the International Supercomputing Conference (ISC) special address by Dr. Ian Buck, NVIDIA vice president of hyperscale and HPC computing, on May 30 in Hamburg, Germany. While presenting this perspective shift on the nature of HPC and AI in the context of edge computing, the special address also included the introduction to a platform that aims to solve this dilemma of data-intensive workloads for HPC at the edge: NVIDIA Holoscan.

Introducing the NVIDIA Holoscan platform for HPC Edge

The NVIDIA Holoscan platform has expanded to meet the specific needs of DevOps engineers, performance engineers, data scientists, and researchers working at these incredible edge instruments.

Modern real-time, edge AI applications are increasingly becoming multimodal. They involve high-speed IO, vision AI, imaging AI, graphics, streaming technologies, and more. Creating and maintaining these applications is extremely difficult. Scaling them is even harder.

NVIDIA is building the Holoscan SDK to address these challenges.

Diagram shows sensor data input to NVIDIA Holoscan architecture stack and photo results. — *Figure 1. NVIDIA Holoscan for HPC workflow*

While it was initially targeted at healthcare, Holoscan is a universal computation and imaging platform built for high performance while meeting the Size-Weight-and-Power (SWaP) constraints at the edge.

Now, the Holoscan platform has been extended, thanks to an easy-to-use software framework that maximizes developer productivity by ensuring maximum streaming data performance and computation. The platform is cloud-native and supports hybrid computing and data pipelining between edge locations and data centers. It is also architected for scalability, using network-aware optimizations and asynchronous computation.

The extended Holoscan platform delivers a flexible software stack that can run on embedded devices based on the NVIDIA Jetson AGX Xavier or Jetson AGX Orin. There is also a cloud-native version that runs on common high-performance hardware to accelerate data analysis and visualization workflows at the edge.

Introducing a framework for composing data processing pipelines with the Holoscan SDK

The finest minds in HPC and AI research are continuously developing faster and better algorithms to solve today’s most challenging problems. However, many developers find it challenging to port their models and codes to full-rate production, particularly when faced with high-rate streaming input and strict throughput and latency requirements.

An effective solution requires a myriad of skill sets: talent coming from data scientists to performance engineers while spanning multiple software languages, hardware and software architectures, localities, and scaling rules. As a result, NVIDIA uses an application framework composed of fragments (a directed acyclic graph (DAG) of operators) to ease the research-to-production burden while maintaining speed-of-light performance.

Diagram shows a sensor application composed of two fragments connected in a pipeline. Each fragment is composed of operators that have input ports and output ports and these are all connected to make up the whole data processing pipeline for the application. — Figure 2. *Within Holoscan, the* *HPC streaming data pipelines are standardized, using fragments and operators, for building a modular and reusable pipeline for sensor data*

NVIDIA Holoscan incorporates a network-aware, flexible, and performance-oriented streaming data framework that standardizes and simplifies cloud-to-edge production HPC and AI deployments for C++ and Python developers alike.

When you build an NVIDIA Holoscan pipeline, specify the application data flow. along with scaling and placement logic. The placement logic dictates what hardware a data flow runs, and the scaling logic expresses how many parallel copies are needed to meet performance requirements.

NVIDIA Holoscan easily integrates with both C++ and Python code along with the NVIDIA catalog of domain-specific SDKs.

The latest release is available for download. The Holoscan SDK has a central repository, called Holohub, for users and developers of extensions and applications for the Holoscan platform to share reusable components and sample applications. For more information, see Developing Streaming Sensor Applications with HoloHub from NVIDIA Holoscan.

AI for visualization and imaging

NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge. It ran up to 5x faster than the previous generation Jetson AGX Xavier, while delivering an average of 2x better energy efficiency.

Jetson AGX Orin is a key ingredient in Holoscan for HPC and NVIDIA Clara Holoscan, a platform system makers and researchers are using to develop next-generation AI instruments. Its powerful computation capabilities for imaging and its versatile software stack makes it appealing to HPC edge use cases involving visualization and imaging.

With its JetPack SDK, Orin runs the full NVIDIA AI platform, a software stack already proven in the data center and the cloud. It is backed by a million developers using the NVIDIA Jetson platform.

The Advanced Photon Source (APS) at the US Department of Energy’s Argonne National Laboratory produces ultrabright, high-energy photon beams. The photons are 100 billion times brighter than a standard hospital X-ray machine and can capture images at the nano and atomic scale. With its APS-U upgrade in 2024, it will be able to generate photons that are up to 500x brighter than the current machine.

The Diamond Light Source at Oxford is a world-class synchrotron facility and is upgrading its brightness and coherence, up to 20 times, across existing beamlines plus five new flagship beamlines. Data rates from Diamond are already petabytes per month and, with Diamond-II, are expected to be at least an order of magnitude greater.

Worldwide, there are over 50 advanced light sources supporting the work of more than 16,000 researcher scientists and there are many more upgrades occurring at these instruments as well. While all these advancements are remarkable in their own accord, they are dependent on computational and data scientists to be ready with their AI-enabled data processing applications running on supercomputers at the edge.

PtychoNN: The APS edge computing platform

The APS is a machine about the size of a football field that produces photon beams. The beams are used to study materials, physics, and biological structures.

Today, one way of generating images of a material with nanoscale resolution is ptychography, a computationally intensive method to convert scattered X-ray interference patterns into images of the actual object.

To date, the method requires solving a challenging inverse problem, namely using forward and inverse Fourier transforms to iteratively compute the image of the object from the diffraction patterns observed in tens of thousands of X-ray measurements. Scientists wait for days just to get the experiment image results.

Now, with AI, scientists can bypass much of the inversion process and view images of the object while the experiment is running, even potentially making adjustments on-the-fly.

With AI, APS scientists were able to use a streaming ptychography pipeline, accelerated by a deep convolutional neural network model, PtychoNN, to speed up image processing by over 300x and reduce the data required to produce high-quality images by 25x.

Alt text: Diagram shows that the high performance inference model generates live images at the edge instrument, in this case, an x-ray detector. The model is trained on a multi-node NVIDIA A100 cluster using retrieved data from the detector. — *Figure 3. Train the PtychoNN model at the data center on A100s and deploy the trained AI model at the beamline instrument with AGX Orin running PtychoNN to stream images 300x faster*

The PtychoNN model is trained on NVIDIA A100 Tensor Core GPUs with deep learning and X-ray image phase-retrieval data. The trained model can run on an edge appliance to directly map the incoming diffraction images to images of the object in real space and in real time in only milliseconds.

Faster sampling means more productive use of the instrument, delivering opportunities to investigate more materials. It provides capabilities not possible before, such as looking at biological materials samples that were damaged in the X-ray beam, samples that are changing rapidly, or samples that are large compared to the size of the X-ray beam.

A common hardware and software architecture simplifies orchestration with NVIDIA AGX at the edge and clusters of A100 GPUs in the data center. The solution is easily extensible to keep up with the 125x increase in data rate expected at the APS. The increase is expected from a detector upgrade in 2022 and a facility upgrade in 2024.

“In order to make full use of what the upgraded APS will be capable of, we have to reinvent data analytics. Our current methods are not enough to keep up. Machine learning can make full use and go beyond what is currently possible.”
Mathew Cherukara, Argonne National Laboratory Computational Scientist

This workflow and approach using NVIDIA GPUs and PtychoNN may be an applicable model for many other light sources around the world that can also accelerate scientific breakthroughs with real-time X-ray imaging.

In the example earlier, a single GPU edge device accelerates a stream of images using a trained neural network. Turnaround times for edge experiments that took days can now take fractions of a second, providing researchers with real-time interactive use of their large-scale scientific instruments. For more information about other relevant HPC and AI at the edge examples, see the following resources:

While many of our highlighted edge HPC applications are focused on streaming video and imaging pipelines, NVIDIA Holoscan can be extended to other sensor types with a variety of data formats and rates. Whether you are performing high-bandwidth spectrum analysis with a software-defined radio or monitoring telemetry from a power grid for anomalies, NVIDIA Holoscan is the platform of choice for software-defined instruments.

By focusing on developer productivity and application performance regardless of the sensor, HPC at the edge can provide real-time analytics and mission success.

Featured image courtesy of US Department of Energy’s Argonne National Laboratory, Advanced Photon Source (APS)