Supporting Low-Latency Streaming Video for AI-Powered Medical Devices with Clara Holoscan

Discuss (0)
Closeup of a surgeon at work.

NVIDIA Clara Holoscan provides a scalable medical device computing platform for developers to create AI microservices and deliver insights in real time. The platform optimizes every stage of the data pipeline: from high-bandwidth data streaming and physics-based analysis to accelerated AI inference, and graphic visualizations.

The NVIDIA Clara AGX Developer Kit, which is now available, combines the efficient Arm-based embedded computing of the AGX Xavier SoC with the powerful NVIDIA RTX 6000 GPU and the 100 GbE connectivity of the NVIDIA ConnectX-6 network processor. This brings real-time AI acceleration to the next generation of intelligent, software-defined, embedded medical devices. Developers using the Clara AGX Developer Kit for surgical video applications—such as AI-enhanced endoscopy, laparoscopy, or other minimally invasive procedures—require the minimum possible end-to-end latency in their video processing path. Customers can use the Clara Holoscan SDK v0.1 on the Clara AGX Developer Kit today and on the next-generation developer kit in the second half of 2022.

The demands of surgical video necessitate consistent and reliable low-latency, between the image captured by the endoscope and the image projected on a monitor. This provides surgeons with real-time control of their tools and monitoring of the patient.

In a typical endoscopy system, the image is digitized at the camera sensor in the endoscope, serialized by an FPGA or ASIC and transmitted to a video processor where it is written to an input frame buffer, processed, written to an output frame buffer, and then transmitted serially to the monitor.  Each of these steps adds latency to the video pipeline.  Developers who wish to add advanced GPU-accelerated AI processing are then faced with additional transmission latency due to the need to write the data from the video capture card to system memory, then transfer it via the CPU and PCIe bus to the GPU.  

GPU compute performance is a key component of the NVIDIA Clara Holoscan platform. To optimize GPU-based video processing applications, NVIDIA has partnered with AJA Video Systems to integrate their line of video capture cards with the Clara AGX Developer Kit.  AJA provides a wide range of proven, professional video I/O devices. The partnership between NVIDIA and AJA has led to the addition of Clara AGX Developer Kit support in the AJA NTV2 SDK and device drivers as of the NTV2 SDK 16.1 release.

The AJA drivers and SDK now offer GPUDirect support for NVIDIA GPUs. This feature uses remote direct memory access (RDMA) to transfer video data directly from the capture card to GPU memory. This significantly reduces latency and system PCIe bandwidth for GPU video processing applications, as system memory to GPU copies are eliminated from the processing pipeline.

AJA devices now also incorporate RDMA support into the AJA GStreamer plug-in to enable zero-copy GPU buffer integration with the DeepStream SDK. DeepStream applications can now process video data along the entire pipeline, from the initial capture to final display, without leaving GPU memory.

NVIDIA Clara Holoscan SDK v0.1 builds on the features of the previous Clara AGX SDK and adds tools to allow for detailed measurement of video transfer latency between video I/O cards, the CPU, and the GPU. This will enable users to measure latency with various configurations, allowing them to focus on improving bottlenecks and optimizing their workflows for minimum end-to-end latency.

Data transfer latency was measured using the Clara AGX Developer Kit with an AJA capture card using the internal PCIe Gen3 x8 connection. The following tables demonstrate the latency reduction that can be achieved using GPUDirect. 

720p YUV1280720260
1080p YUV19201080260
4K UHD YUV38402160260
720p RGBA1280720460
1080p RGBA19201080460
4K UHD RGBA38402160460
Table 1. Video formats tested.

The total time for video data transfer to and from the GPU, as well as time remaining for processing in the GPU, was then measured with and without GPUDirect enabled:

Without GPUDirect GPUDirect
  Transfer time, no processing (ms) Time remaining for processing (ms) Transfer time, no processing (ms) Time remaining for processing (ms)
720p YUV 1.945 14.721 0.956 15.710
1080p YUV 3.865 12.801 1.723 14.943
4K UHD YUV 12.805 3.861 6.256 10.410
720p RGBA 3.451 13.215 1.548 15.118
1080p YUV 6.816 9.850 3.225 13.444
4K UHD RGBA 23.686 -7.020 12.406 4.260
Table 2. Latency (ms) with and without GPUDirect.

Note that GPUDirect cuts transfer time approximately in half by removing the need for writes to system memory. GPUDirect allows for the transfer and processing of 4K UHD RGBA inputs at 60 fps. This can now be transferred under the 16.666 ms frame time, whereas without GPUDirect this format could not be transferred at 60 fps. This allows for uncompressed high-resolution video to be natively alpha-blended with overlays from AI workflows. There is no need for conversion from YUV to RGBA formats, and no compromise in the 60 fps frame rate. 

For instructions on how to set up and use an AJA device with the Clara AGX Developer Kit, including RDMA and DeepStream integration, go to Chapter 9 of the Clara Holoscan SDK User Guide.