Capturing video footage and playing games at 8K resolution with 60 frames per second (FPS) is now possible, thanks to advances in camera and display technologies. Major leading multimedia companies including RED Digital Cinema, Nikon, and Canon have already introduced 8K60 cameras for both the consumer and professional markets.
On the display side, with the newest HDMI 2.1 standard, 8K60 is now widely available, supporting both gaming monitors and smart TVs. While 8K60 provides stunning image quality and sharpness, it comes with the significant cost of consuming more data for both transfer and storage.
Fast codecs are therefore paramount in bridging the gap between the sensors and the display. To make 8K60 widely available, NVIDIA Ada Lovelace GPU architecture provides multiple NVENC engines to accelerate video encoding performance while maintaining high image quality. (Two NVENCs are provided with NVIDIA RTX 4090 and 4080 and three NVENCs with NVIDIA RTX 6000 Ada Lovelace or L40.)
In practice, this can double or triple the encoding performance with a single GPU when compared to previous generations, enabling 8K60 video encoding and beyond.
This post showcases how the multiple available NVENCs in NVIDIA Ada Lovelace architecture are leveraged using a split-frame encoding (SFE) technique to achieve 8K60 video encoding performance. We explore how this SFE technique works at 4K and 8K resolutions and how to enable it through the NVENCODE API. Finally, we present several benchmarks that show, in practice, the massive performance benefits of this technique.
Split-frame encoding
SFE is a technique that enables exploiting multiple NVENCs present in NVIDIA Ada Lovelace GPUs when encoding a single video sequence by splitting the frames and encoding each partial frame with different NVENC engines. It was introduced in NVIDIA Video Codec SDK 12.0. SFE can effectively split the encoding work across the available NVENCs (Figure 1). However, until now, SFE was implicitly enabled based on the encoding preset, tuning information, and resolution to support 8K live encoding in HEVC or AV1. (Note that 8K is not supported on H.264.) To learn more, see Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture.
With NVIDIA Video Codec SDK 12.1, you can enable or disable the SFE feature. This means that SFE can now be used to take advantage of two or even three NVENCs present within the NVIDIA RTX 4090 and the NVIDIA RTX 6000 Ada Generation, respectively, without resolution, preset, and tuning information restrictions. This enables the application to double or even triple the encoding performance when encoding a single video sequence by using a two-way or three-way SFE. Such performance is especially important when encoding 8K, which is a particularly demanding use case.
SFE at 4K and 8K resolution
How SFE is applied can vary depending on the resolution and selected video codec. When using HEVC with SFE turned off, expect only a single slice to be used. When two-way or three-way SFE is used, two or three slices, respectively, are used. These horizontally separate each frame. It applies to both 4K and 8K resolutions (Figure 2). Additionally, the same applies to AV1 when encoding video up to 4K resolution. However, AV1 uses tiles instead of slices to create these independent frame partitions.
When encoding 8K video with AV1, consider that the maximum tile resolution defined by the standard is 4096 x 2304 pixels. This means that when encoding 8K video, each frame will be split into four tiles, each with a quarter of the resolution (3840 x 2160 pixels). When SFE is used, to achieve the same performance benefits as for HEVC, each tile will be further split horizontally, for eight or 12 tiles, for two-way and three-way SFE, respectively (Figure 3).
Table 1 summarizes the number of partial frames and resolution to expect per codec and input video resolution.
Codec | Video resolution | Number of partial frames and resolution | |||||
No SFE | Two-way SFE | Three-way SFE | |||||
HEVC | 4K | 1x 3840 x 2160 | 2x 3840 x 1080 | 3x 3840 x 720 | |||
8K | 1x 7680 x 4320 | 2x 7680 x 2160 | 3x 7680 x 1440 | ||||
AV1 | 4K | 1x 3840 x 2160 | 2x 3840 x 1080 | 3x 3840 x 720 | |||
8K | 4x 3840 x 2160 | 8x 3840 x 1080 | 12x 3840 x 720 |
Enabling split-frame encoding
With the API update of Video Codec SDK 12.1, in the latest NVENCODER API header, you can find NV_ENC_SPLIT_ENCODE_MODE
. This enables control over SFE, as shown in Table 2. It is now quite easy to configure SFE using either implicit or explicit modes. NV_ENC_SPLIT_AUTO_MODE
and NV_ENC_SPLIT_AUTO_FORCED_MODE
provide a way to use the SFE implicit mode. To learn more, see Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture.
The remaining options refer to explicit SFE configuration. These include forcing SFE to be disabled, two-way, or three-way. To force two-way or three-way SFE requires an NVIDIA GPU with the appropriate number of NVENC engines.
NV_ENC_SPLIT_ENCODE_MODE | SFE type | Description |
NV_ENC_SPLIT_AUTO_MODE (0) | Auto Mode (default) | Two-way SFE will be implicitly triggered based on input video resolution and encoding parameters |
NV_ENC_SPLIT_AUTO_FORCED_MODE (1) | Force Auto Mode | |
NV_ENC_SPLIT_TWO_FORCED_MODE (2) | Force two-way SFE | The respective SFE configuration will be used regardless of the input video and encoding parameters |
NV_ENC_SPLIT_THREE_FORCED_MODE (3) | Force three-way SFE | |
NV_ENC_SPLIT_DISABLE_MODE (15) | Force no SFE |
NV_ENC_SPLIT_ENCODE_MODE
optionThe latest Video Codec SDK encoding sample AppEncMultiInstance
also highlights how to add explicit SFE control to an application.
Performance and compression efficiency benchmarking
Several configurations and input 8K videos were tested, which are listed in Table 3.
Benchmarking configuration | |
GPU | GPU RTX 6000 Ada Generation (3 NVENCs) |
Input videos | 7 videos (4 gaming and 3 natural) |
Encoders | HEVC and AV1 |
Presets | P1 (fastest), P4 (medium) and P7 (slowest) |
Tuning Information | Low latency (LL) and high quality (HQ) |
Bitrates | 15, 20, 60, 150, and 250 Mbps |
Two types of benchmarks were performed:
Transcoding Performance: Transcoding was used to minimize the influence of system bottlenecks (file I/O and memory copies between CPU and GPU). To test transcoding, the original 8K videos were pre-encoded with very high bitrates. During transcoding, NVDEC decodes the video. It is encoded by one to three NVENCs, when no split, two-way SFE, and three-way SFE are used, respectively. The performance results are shown in Figures 4 and 5 for HEVC and AV1, respectively.
Compression Efficiency Penalty: By splitting encoding work across several NVENCs, a compression efficiency penalty is expected. To measure this penalty, BD-RATE was used across several benchmark configurations to compare the compression efficiency between no-split, two-way SFE, and three-way SFE. This metric indicates the average compression efficiency penalty for the same objective quality. The objective quality metric used in these benchmarks was PSNR. The compression efficiency penalty results are shown in Figures 6 and 7 for HEVC and AV1, respectively.
When using two-way SFE, expect an average performance scaling of about 1.8x for both HEVC and AV1. Three-way SFE can achieve a performance scaling of up to 2.95x for HEVC and 2.31x for AV1. In practice, this enables 8K60 video encoding with NVIDIA RTX 6000 Ada Generation, using both HEVC and AV1, with LL and HQ tuning information at a medium preset (P4).
Given that one to three NVENCs and a single NVDEC are used, NVDEC may become the bottleneck when transcoding 8K. For this reason, the fastest preset (P1) can result in the FPS reaching a maximum of about 120 FPS on average. This is the average maximum performance achieved by a single NVDEC at 8K.
You can observe better scaling as long as NVDEC isn’t the bottleneck. This is the case for slower presets, such as P4 and P7, where the performance scaling is much better in comparison to P1.
In general, the compression efficiency penalty isn’t expected to exceed 2% for two-way SFE and 4% for three-way SFE when using BD-RATE (PSNR) to measure quality. This penalty is more noticeable for HQ tuning information than for LL. Additionally, according to the benchmarks performed, this penalty is slightly more prominent when using HEVC compared to AV1.
Although this compression efficiency penalty is still relatively low compared to the performance tradeoff, it’s up to the user to determine if the required use case benefits from more performance or compression efficiency. Regardless, the NVENCODE API provides full control over SFE not only for 8K but also for lower resolutions.
Summary
Split-frame encoding (SFE) is a breakthrough feature that unlocks video encoding capabilities at 8K60 and beyond. It empowers users to harness the power of multiple NVENCs within NVIDIA Ada Lovelace architecture GPUs for encoding a single video sequence. This post has explained the performance advantages of two-way SFE (using two NVENCs) and three-way SFE (using three NVENCs). The latest NVIDIA Video Codec SDK provides explicit control over SFE for optimal customization.