VRWorks 360 Video is NVIDIA's GPU-accelerated video stitching SDK. It allows developers to take video feeds from camera rigs ranging in size from 2 to 32 cameras and stitch them into a single 360-degree panoramic video.
|Hardware:||Compatible with: Maxwell and Pascal based GPUs. (GeForce GTX 900 series and Quadro M5000 and higher)|
|Software:||64 bit Windows
NVIDIA graphics driver 375.86 or later
Microsoft Visual Studio 2015 (MSVC14.0) or later
CMake 3.2 or above
360 video processing is computationally intensive and the stitching process itself can be very complex. By implementing state of the art algorithms and leveraging GPU acceleration, NVIDIA’s VRWorks 360 Video SDK provides a high performance, high quality, and low latency GPU-accelerated implementation that can be easily integrated into 360 video workflows, enabling real-time capture, high quality stitching and streaming of 360 video.
NVIDIA VRWorks 360 Video SDK consists of a library, set of APIs, sample applications and documentation.
The SDK will support both Mono and Stereo modes. The current beta supports Mono only. Stereo support is coming soon.
Panoramas have gained popularity due to their ability to encompass a wide view of a physical space, amassing more visual information in a single picture, akin to the abilities of the human visual system. The push towards virtual reality has given rise to an increasing need for panoramas with a 360 x 180 degree field of view to deliver a sense of immersion and presence.
In general, a photograph captures the light field at a single location, towards a preferred direction, projected to a plane. In other words, the camera captures a set of rays that converge upon a specific point (the nodal point of the lens). Like a photograph, a panorama is a sampling of the light field at a specific location. However, it samples all directions - all of 360 degrees to be precise.
The VRWorks 360 Video SDK Beta can ingest MP4 compressed videos. It can also read from RGB files as well as CUDA arrays. The SDK inherently supports HW accelerated decode of compressed input. VRWorks 360 Video SDK currently supports up to 32 inputs.
The VRWorks 360 Video SDK Beta supports two stitching modes: Feathering and Multiband blending. Feathering is a plain weighted averaging of collocated pixels. It is simple, quick, and efficient. Multiband blending involves combining the region of overlap between images with careful consideration to the frequencies present to allow for retention of detail, smooth seams and minimal distortion.
The VRWorks 360 Video Beta SDK supports various output formats: H.264 compressed streams, OpenGL textures, and RGB files. Compression is HW accelerated. Cube-map and equirectangular projections are supported .
ROIs and Laplacian Image Generation
The first step of our multiband blending implementation is the computation of the Region of Interest (ROI) in the output buffer corresponding to each input camera feed. The next step is to project each input into the corresponding ROI. This computation takes the camera parameters and the desired resolution of the output into account. Thereafter, the Laplacian pyramid is generated for each of the projected inputs.
The projected frames are finally blended at each level using masks and the final output is synthesized.
Masks determine the path that the seams will follow. The masks are compute at the base level and a Gaussian pyramid of this mask is generated to blend at each level. The width of the region to be blended at each level increases at subsequent down-sampled levels.
Number of Levels
All the pyramids used have the same number of levels. The current implementation computes the number of levels from the output buffer resolution such that at the lowest level the smallest surface dimension is no less than 16 pixels (capped at 8 levels).
Multiband blending is very sensitive to the type of filter used, both for downsampling and upsampling. The repeated upsampling and downsampling required can cause minor artifacts in the smaller levels to be greatly amplified as the pyramid is synthesized.
CUDA Streams and Multi-GPU Scaling
Our Multiband implementation maps very well to CUDA Streams and it is a good candidate for multi-GPU scaling. Most of the processing is performed on a per-camera basis and only the last blending + synthesizing stages require inputs from each of the camera pipelines.
Once the ROIs are generated, the first steps are to project the images into the base level of their image pyramids (purple), generate Gaussian and Laplacian pyramids, blend and synthesize. Note that with this approach there is no need for synchronization until the blending stage, which means the CUDA streams can be executed on different GPUs.