VRWorks 360 Video is NVIDIA's GPU-accelerated video stitching SDK. It allows developers to take video feeds from camera rigs ranging in size from 2 to 32 cameras and stitch them into a single 360-degree panoramic video.

Hardware: Compatible with: Maxwell and Pascal based GPUs. (GeForce GTX 900 series and Quadro M5000 and higher)
Software: 64 bit Windows
NVIDIA graphics driver 375.86 or later
Microsoft Visual Studio 2015 (MSVC14.0) or later
CMake 3.2 or above

360 video processing is computationally intensive and the stitching process itself can be very complex. By implementing state of the art algorithms and leveraging GPU acceleration, NVIDIA’s VRWorks 360 Video SDK provides a high performance, high quality, and low latency GPU-accelerated implementation that can be easily integrated into 360 video workflows, enabling real-time capture, high quality stitching and streaming of 360 video.


NVIDIA VRWorks 360 Video SDK

NVIDIA VRWorks 360 Video SDK consists of a library, set of APIs, sample applications and documentation.

The SDK will support both Mono and Stereo modes. The current beta supports Mono only. Stereo support is coming soon.

Download VRWorks 360 Video SDK Mono Beta


Key Features

  • Supports camera rigs with 2 to 32 cameras
  • Supports various types of fisheye lenses
  • Supports offline and real-time stitching
  • Supports audio stitching in AAC format in offline mode
  • Hardware-accelerated decoding via NVDEC and encoding via NVENC
Input:
  • MP4 files, RGBA files, or RGBA CUDA arrays
  • Video resolutions up to 4K
  • Includes a sample XML file format for specifying rig and camera parameters used in sample apps
Output:
  • Mono panorama
  • 3x2 cube map and equirectangular 360 projection format
  • MP4 files, RGBA files, or RGBA OpenGL textures
  • Video resolutions up to 4K

Panoramas

Panoramas have gained popularity due to their ability to encompass a wide view of a physical space, amassing more visual information in a single picture, akin to the abilities of the human visual system. The push towards virtual reality has given rise to an increasing need for panoramas with a 360 x 180 degree field of view to deliver a sense of immersion and presence.


In general, a photograph captures the light field at a single location, towards a preferred direction, projected to a plane. In other words, the camera captures a set of rays that converge upon a specific point (the nodal point of the lens). Like a photograph, a panorama is a sampling of the light field at a specific location. However, it samples all directions - all of 360 degrees to be precise.


Stitching

Stitching is a common approach to generating a 360 degree panorama from a set of images. It involves aligning images and combining them appropriately in the areas of overlap to produce a seamless 360 degree view. These images can be generated by rotating a single lens, or by using a rig that has sufficient number of cameras with a wide enough field of view to capture the 360 degree space. Another common use of this capability involves the generation of a 360 video by combining many such time sequential 360 panoramas to enhance the sense of realism.

Complexity

As you can already tell, the complexity of the problem scales with the number of images. Targeting higher resolutions and refresh rates in the captured footage adds to the burden of computation as well. Having non identical cameras in a rig, the lack of camera synchronization, and/or poor camera calibration increase the computational complexity even more. And how about doing all of this in real-time?

VRWorks 360 Video SDK to the rescue

VRWorks 360 Video SDK is a GPU-accelerated solution designed to achieve real-time stitching of images into 360 degree panoramas. The Beta version of the VRWorks 360 Video SDK contains a simple test application that demonstrates the ease of its use. The test application encapsulates a three stage pipeline: ingest, stitch, and output

Exploring available options

Ingest
The VRWorks 360 Video SDK Beta can ingest MP4 compressed videos. It can also read from RGB files as well as CUDA arrays. The SDK inherently supports HW accelerated decode of compressed input. VRWorks 360 Video SDK currently supports up to 32 inputs.


Stitch
The VRWorks 360 Video SDK Beta supports two stitching modes: Feathering and Multiband blending. Feathering is a plain weighted averaging of collocated pixels. It is simple, quick, and efficient. Multiband blending involves combining the region of overlap between images with careful consideration to the frequencies present to allow for retention of detail, smooth seams and minimal distortion.


Output
The VRWorks 360 Video Beta SDK supports various output formats: H.264 compressed streams, OpenGL textures, and RGB files. Compression is HW accelerated. Cube-map and equirectangular projections are supported .




Multiband Blending

ROIs and Laplacian Image Generation
The first step of our multiband blending implementation is the computation of the Region of Interest (ROI) in the output buffer corresponding to each input camera feed. The next step is to project each input into the corresponding ROI. This computation takes the camera parameters and the desired resolution of the output into account. Thereafter, the Laplacian pyramid is generated for each of the projected inputs.



The projected frames are finally blended at each level using masks and the final output is synthesized.

Mask Generation

Masks determine the path that the seams will follow. The masks are compute at the base level and a Gaussian pyramid of this mask is generated to blend at each level. The width of the region to be blended at each level increases at subsequent down-sampled levels.

Number of Levels

All the pyramids used have the same number of levels. The current implementation computes the number of levels from the output buffer resolution such that at the lowest level the smallest surface dimension is no less than 16 pixels (capped at 8 levels).

Sampling

Multiband blending is very sensitive to the type of filter used, both for downsampling and upsampling. The repeated upsampling and downsampling required can cause minor artifacts in the smaller levels to be greatly amplified as the pyramid is synthesized.

CUDA Streams and Multi-GPU Scaling

Our Multiband implementation maps very well to CUDA Streams and it is a good candidate for multi-GPU scaling. Most of the processing is performed on a per-camera basis and only the last blending + synthesizing stages require inputs from each of the camera pipelines.


Once the ROIs are generated, the first steps are to project the images into the base level of their image pyramids (purple), generate Gaussian and Laplacian pyramids, blend and synthesize. Note that with this approach there is no need for synchronization until the blending stage, which means the CUDA streams can be executed on different GPUs.