Content Creation / Rendering

Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture

A side-by-side comparison of two versions of a graphic.

AV1 is the new gold standard video format, with superior efficiency and quality compared to older H.264 and H.265 formats. It is the most recent royalty-free, efficient video encoder standardized by the Alliance for Open Media. 

NVIDIA Ampere architecture introduced hardware-accelerated AV1 decoding. NVIDIA Ada Lovelace architecture supports both AV1 encoding and decoding. NVIDIA Ada architecture also brings back support for multiple encoders per GPU (up to three encoders and four decoders per GPU), enabling higher throughput compared to previous generations. 

NVIDIA NVENC AV1 performance

NVIDIA NVENC AV1 offers substantial compression efficiency with respect to H.264 and HEVC at better performance. To quantify the quality improvements, we investigated peak signal-to-noise ratio (PSNR) and video multimethod assessment fusion (VMAF) scores for AV1 and H.264. PSNR and VMAF are video quality metrics frequently used to gauge encoding quality. 

PSNR score

PSNR is a decibel value that quantifies the reconstruction quality of images. It is the ratio between the maximum power of a signal which is the original image or video and the noise introduced by compression. As shown in Figure 1, NVENC AV1 encoding results in ~1.5-2 dB higher PSNR compared to NVENC H.264 at the same bit rate. In other words, to achieve the same PSNR, H.264 encoding requires a considerably higher bit rate than AV1. For example, AV1 achieves 42 dB PSNR at 7 Mbps compared to 11 Mbps for H.264. 

This translates into a 40% bit rate savings for AV1 over H.264 at 1080p60 at a similar quality. For a given low latency quality preset on H.264, bit rate gains are up to 40%, representing more than 1.8 GB of saved data for two hours of a 1080p 5 Mbps streamed video. Similar bit rate savings were observed at 720p, 1440p, and 4K. 

Graph showing PSNR versus bit rate for NVENC AV1 and NVENC H.264X. Average data collected from more than 100 videos.
Figure 1. PSNR versus bit rate for NVENC AV1 and NVENC H.264X. Average data collected from more than 100 videos.

VMAF score

VMAF is a video quality metric with high correlation to human perception of  streaming video quality. The VMAF scores plotted in Figure 2 were collected with the identical set of videos used for PSNR evaluation. The NVENC AV1 outperforms NVENC H.264 in terms of quality. AV1 performs better than H.264 at low bit rates, and hence provides better visual quality in tough QoS scenarios. For perceptual video quality, the gap between H.264 and AV1 encoded videos reduces as the bit rate increases, as expected.

Graph showing increase in perceived video quality (VMAF) as bit rate rises for NVENC AV1 and NVENC H.264
Figure 2. Increase in perceived video quality (VMAF) as bit rate rises for NVENC AV1 and NVENC H.264

Video 1 shows a quality comparison of AV1 video encoded on an NVIDIA Ada Lovelace architecture GPU versus H.264 video encoded using x264 software. The H.264 video is encoded using medium presets at 30 Mbps, while the AV1 video is encoded at 18 Mbps using the high-performance presets. The quality of both videos is comparable. The throughput of the AV1 encoder is 500 fps, almost 9x faster than the x264 encoder.

Video 1. Quality comparison of AV1 versus H.264 video streams encoded at identical bit rates on NVIDIA GPUs

Performance in frames per second across resolutions/presets

NVENC performance has been steadily increasing with every generation. NVIDIA Turing and NVIDIA Ampere GPU architecture both had one encoder per chip, while NVIDIA Ada architecture can support up to three encoders per chip. 

With NVIDIA Ada architecture, the driver handles the load balancing among the multiple encoders automatically. This enables any application to take advantage of the NVENCs without any special code enabling higher encoder throughput. However, the throughput is subject to clocks, hardware performance limits, and available memory.

NVENCODE API exposes several presets, rate control modes, and tuning information modes for programming the hardware for different use cases. A combination of these parameters enables video encoding at varying quality and performance levels. This enables the application to achieve the desired quality rather than encoding performance tradeoff at granular levels. 

Table showing encode frames per second (fps) of 1 NVENC at 2160p resolution. Measurements were taken on A10 (NVIDIA Ampere architecture) and L40 (NVIDIA Ada Lovelace architecture) with NVDCLK of 1485 MHz and 1905 MHz, respectively.
Table 1. Encode frames per second (fps) of 1 NVENC at 2160p resolution. Measurements were taken on A10 (NVIDIA Ampere architecture) and L40 (NVIDIA Ada Lovelace architecture) with NVDCLK of 1485 MHz and 1905 MHz, respectively.

Max resolution support

Table 2 shows the max resolution support for AV1, HEVC, and H.264. NVIDIA Ada is the first generation of GPUs supporting 10-bit 8K60 encoding for AV1 and HEVC.

Table showing supported encoding resolutions for NVENC AV1, H.265, and H.264.
Table 2. Supported encoding resolutions for NVENC AV1, H.265, and H.264 

The dedicated encoder hardware NVENC can perform 8- and 10-bit AV1 encoding in addition to 8-bit H.264, 8- and 10-bit HEVC encoding. For more details about NVENC capabilities, see the NVIDIA Hardware Video Encoder documentation.

The hardware-accelerated video encoding and decoding functionality is accessible to applications through NVENCODE and NVDECODE APIs, respectively, which are a part of the NVIDIA Video Codec SDK

NVIDIA Video Codec SDK 12.0 features

Video Codec SDK 12.0, which was released in November 2022, contains support for NVIDIA Ada Lovelace GPU hardware, along with the new features detailed below.

Split encoding 8K60

Video Codec SDK 12.0 on NVIDIA Ada GPUs support a feature called Split Frame Encoding for AV1 and HEVC, which can encode frames with resolutions greater than 4K using multiple encoders, whenever available. With this feature, the frame is split into two parts. Each part is sent to a different encoder, if the GPU contains multiple encoders. This helps improve the overall encoding performance. 

This feature is enabled automatically only at high resolutions, under the conditions shown in Table 3. Note that splitting the frame across independent encoders may result in quality that is suboptimal compared to that achieved by encoding the entire frame on the single encoder. Therefore, this method of performance improvement is not enabled across all presets and resolutions.

Preset
Tuning info
p1 (fastest)p2p3p4p5p6p7 (slowest)
High qualitySplit frameSplit frameNormalNormalNormalNormalNormal
Low latencySplit frameSplit frameSplit frameSplit frameNormalNormalNormal
Ultra-low latencySplit frameSplit frameSplit frameSplit frameNormalNormalNormal
Table 3. Preset and tuning criteria that determine when split encoding is enabled

If certain features in NVENC are enabled, split encoding gets disabled automatically regardless of whether the tuning and preset conditions outlined in Table 3 are met. The features not compatible with split frame encoding are listed below.

HEVC

  • Weighted prediction
  • Alpha layer
  • Subframe mode
  • Bitstream output into video memory
  • Picture timing / buffering period SEI message insertion onto DX12 path

AV1

  • Bit stream output into video memory

Multiple NVENCs for higher throughput

Some NVIDIA Ada GPUs have more than one NVENC. This enables support for encoding more streams in parallel. When encoding a single stream, frames are sent to a different NVENC sequentially.  Therefore, using multiple NVENCs does not improve the throughput when encoding a single video stream but can increase the overall throughput when encoding two or more video streams in parallel. On GPUs with multiple NVENCs, different frames from different streams will get scheduled across multiple NVENCs, keeping all NVENCs fully utilized, thereby increasing the throughput. 

More NVENCs also help in video editing workflows, in which different independent sections (split across GOP boundary) can be sent to different NVENCs. Such splitting of the video to be encoded can be performed manually by the user (sections with scene changes or different clips being put together, for example) or automatically by the application.  

As an example, a video can be split into three time slots: t0-t1, t1-t2, and t2-t3, where t0, t1, t2, and t3 are monotonically increasing times in the video. Due to multiple encoders, the smaller videos can be encoded in parallel, thereby resulting in a higher overall encoding throughput. 

Batch encoding is a feature that leverages multiple encoders. This feature is useful for transcode type workloads. Transcoding involves decoding an input encode stream, scaling, and re-encoding in desired formats and resolutions. This is easily done on NVIDIA Ada GPUs, as the driver automatically handles the load balancing of the decoded stream and splits the work among the encoders.

Support for AV1 in FFmpeg

FFmpeg is the most popular multimedia transcoding tool used extensively for video and audio transcoding. FFmpeg supports NVENC accelerated AV1 encode and NVDEC accelerated AV1 decode. Applications using FFmpeg now have access to GPU-accelerated encoding and decoding. 

Summary

Superior PSNR, VMAF, bit rate savings and split encoding performance of AV1 over existing codecs make it a very attractive option for video encoding. NVIDIA ADA GPUs support AV1 and can be accessed through the latest version of the NVIDIA Video Codec SDK.

Discuss (7)

Tags