Technical Walkthrough

AV1 Encoding and FRUC: Video Performance Boosts and Higher Fidelity on the NVIDIA Ada Architecture

Discuss (5)

Announced at GTC 2022, the next generation of NVIDIA GPUs—the NVIDIA GeForce RTX 40 series, NVIDIA RTX 6000 Ada Generation, and NVIDIA L40 for data center—are built with the new NVIDIA Ada Architecture.

The NVIDIA Ada Architecture features third-generation ray tracing cores, fourth-generation Tensor Cores, multiple video encoders, and a new optical flow accelerator.

To enable you to fully harness the new hardware upgrades, NVIDIA is announcing accompanying updates to the Video Codec SDK and Optical Flow SDK.

NVIDIA Video Codec SDK 12.0

AV1 is the state-of-the-art video coding format that offers both substantial performance boosts and higher fidelity compared to H.264, the popular standard. Introduced on the NVIDIA Ampere Architecture, the Video Codec SDK extended support to AV1 decoding. Now, with Video Codec SDK 12.0, NVIDIA Ada-generation GPUs support AV1 encoding.

Line chart of PSNR by bit rate shows that AV1 supports higher-quality video at a lower bit rate compared to H.264.
Figure 1. PSNR compared to bit rate for AV1 and H.264

Hardware-accelerated AV1 encoding is a huge milestone in transitioning AV1 to be the new standard video format. Figure 1 shows how the AV1 bit-rate savings translate into impressive performance boosts and higher fidelity images.

PSNR (peak signal to noise ratio) is a video quality measure. To achieve 42 dB PSNR, AV1 video has a 7 Mbps bit rate while H.264 has upwards of 12 Mbps. Across all resolutions, AV1 encoding averages 40% more efficient than H.264. This fundamental performance difference opens the doors for AV1 to support higher-quality video, increased throughput, and high dynamic range (HDR). 

Bar chart shows that at 2160p, AV1 has a 1.45x bit-rate saving compared to NVENC H.264.
Figure 2. Bit-rate saving for AV1 compared to H.264

As Figure 2 shows, at 1440p and 2160p, NVENC AV1 is 1.45x more efficient than NVENC H.264. This new performance headroom enables higher than ever image quality, including 8k.

The benefits of AV1 are best used in unison with the multi-encoder design featured on the NVIDIA Ada Architecture. New to Video Codec SDK 12.0 on chips with multiple NVENC, the processing load is evenly distributed across each encoder simultaneously. This optimization creates a huge reduction in encoding times. Multiple encoders in combination with the AV1 format allows NVIDIA Ada to support an incredible 8k at 60 fps video encode in real time.

AV1 encoding across multiple hardware NVENC is enabling the next generation of video performance and fidelity. Broadcasters can achieve higher livestream resolutions, video editors can export video at 2x speed, and all this is enabled by the Video Codec SDK.

NVIDIA Video Codec SDK 12.0 will be available to download from the NVIDIA Developer Center in October 2022.

NVIDIA Optical Flow 4.0

The new NVIDIA Optical Flow SDK 4.0 release introduces an engine-assisted frame rate up conversion (FRUC). FRUC generates higher frame-rate video from lower frame-rate video by inserting interpolated frames using optical flow vectors. Such high frame rate video shows smooth continuity of motion across frames. The result is improved smoothness of video playback and perceived visual quality.

The NVIDIA Ada Lovelace Architecture has a new optical flow accelerator, NVOFA, that is 2.5x more performant than the NVIDIA Ampere Architecture NVOFA. It provides a 15% quality improvement on popular benchmarks including KITTI and MPI Sintel.

The FRUC library uses the NVOFA and CUDA to interpolate frames significantly faster than software-only methods. It also works seamlessly with custom DirectX or CUDA applications, making it easy for developers to integrate.

Diagram shows four frames of a low frame-rate video being interleaved with interpolated frames to make a high frame-rate video.
Figure 3. Frame rate up conversion

The Optical Flow SDK 4.0 includes the FRUC library and sample application, in addition to basic Optical Flow sample applications. The FRUC library exposes NVIDIA FRUC APIs that take two consecutive frames and return an interpolated frame in between them. These APIs can be used for the up-conversion of any video.

Frame interpolation using the FRUC library is extremely fast compared to other software-only methods. The APIs are easy to use, and support ARGB and NV12 input surface formats. It can be directly integrated into any DirectX or CUDA application.

The sample application source code included in the SDK demonstrates how to use FRUC APIs for video FRUC. This source code can be reused or modified as required to build a custom application.

The Video 1 sample was created using the FRUC library. As you can see, the motion of foreground objects and background appears much smoother in the right video compared to the left video.

Video 1. Side-by-side comparison of original video and frame rate up-converted video. (left) Original video played at 15 fps. (right) Frame rate up-converted video played at 30 fps. Video created using the FRUC library. (Source: http://ultravideo.fi/#testsequences)

Inside the FRUC library

Here is a brief explanation about how FRUC library processes a pair of frames and generates an interpolated frame. 

A pair of consecutive frames (previous and next) are input into the FRUC library (Figure 4).

GIF image shows the previous and next frames of a horse and rider.
Figure 4. Consecutive frames used as input

Using NVIDIA Optical flow APIs, forward and backward flow vector maps are generated.

GIF image shows forward and backward flow vector maps.
Figure 5. Forward and backward flow vector maps

Flow vectors in the map are then validated using a forward-backward consistency check. Flow vectors that do not pass the consistency check are rejected. The black portions in this figure are flow vectors that did not pass the forward-backward consistency check.

Picture shows black spots for rejected flow vectors.
Figure 6. Validated and rejected flow vectors

Using available flow vectors and advanced CUDA accelerated techniques, more accurate flow vectors are generated to fill in the rejected flow vectors. Figure 7 shows the infilled flow vector map generated.

Picture shows the rejected flow vectors filled in with other colors.
Figure 7. Infilled flow vector map
Image shows a closeup of pixel regions without valid color on the interpolated frame.
Figure 8. New interpolated frame with gray regions

Using a complete flow vector map between the two frames, the algorithm generates an interpolated frame between the two input frames. Such an image may contain few holes (pixels that don’t have valid color). This figure shows a few small gray regions near the head of the horse and in the sky that are holes.

Holes in the interpolated frame are filled using image domain hole infilling techniques to generate the final interpolated image. This is the output of the FRUC library.

Image shows the interpolated frame with pixel holes filled in.
Figure 9. Output of the FRUC library

The calling application can interleave this interpolated frame with original frames to increase the frame rate of video or game. Figure 10 shows the interpolated frame interleaved between previous and next image.

GIF shows interpolated frame between original two-frame GIF.
Figure 10. Interpolated frame interleaved

Lastly, to expand the platforms that can harness the NVOFA hardware, Optical Flow SDK 4.0 also introduces support for Windows Subsystem for Linux.

Harness the NVIDIA Ada Architecture and the FRUC library when the NVIDIA Optical Flow SDK 4.0 is available in October. If you have any questions, contact Video DevTech Support.