The release of NVIDIA Video Codec SDK 13.0 marks a significant upgrade, adding support for the latest-generation NVIDIA Blackwell GPUs. This version brings a wealth of improvements aimed at elevating both video encoding and decoding capabilities. From enhanced compression efficiency to better throughput and encoding quality, SDK 13.0 addresses the ever-evolving demands of the video ecosystem.
Here are some of the key features introduced in this update.
Encode features:
- Improved compression efficiency: Achieve better video quality at lower bitrates.
- New YUV 4:2:2 encoding support (H.264 and HEVC): Enables a broader range of use cases, particularly in professional video production and broadcasting.
- Higher bit-depth encoding: Introduces support for 10-bit encoding in H.264, and new hardware capabilities to encode 8-bit content as 10-bit.
- Interlaced encoding (H.264): Adds interlaced encoding support in H.264.
- AV1 lookahead level and UHQ: Introduces lookahead level and UHQ modes for AV1, for latency-tolerant use cases that require the highest possible video quality.
- MV-HEVC support: Supports two views to improve compression for stereoscopic content. For more information, see Enabling Stereoscopic and 3D Views Using MV-HEVC in NVIDIA Video Codec SDK 13.0.
Decode features:
- 4:2:2 decode support (H.264 and HEVC): Expands decoding capabilities, which is especially valuable for professional video workflows.
- Higher bit-depth H.264 decoding: Introduces support for 10-bit decoding in H.264.
- 2x H.264 throughput on NVIDIA Blackwell: Brings an impressive 2x throughput improvement in H.264 decoding compared to previous-generation GPUs.
- H.264 8K support: Now handles ultra-high-definition video decoding with ease, providing future-proof capabilities for 8K content.
- Dynamic decode surface allocation: Enables applications to minimize GPU memory usage.
Encode quality enhancements in NVIDIA Blackwell
Here’s more information about the key encode features in this update.
Improved compression efficiency
NVIDIA encoder (NVENC) hardware in NVIDIA Blackwell includes many enhancements for improving compression efficiency. These include improvement in motion estimation including enhanced subpixel search and better rate distortion optimization (RDO), among others for HEVC and AV1. These enhancements apply across presets and provide significant quality gain over ADA-generation GPUs.
4:2:2 chroma subsampling
In previous generations, NVENC supported formats including 4:2:0 and 4:4:4. YUV 4:4:4 retains full color, resulting in a larger file size and bandwidth requirements to transfer data. In 4:2:0, the full information is retained in the luminance channel but the chroma channel contains only 25% of the original color content.
To overcome the loss in color, NVIDIA Blackwell introduces 4:2:2 chroma. 4:2:2 retains 50% of the color information compared to 4:4:4 but reduces the required bandwidth to transfer video data.
4:2:2 is popular in the video editing and broadcasting eco-system due to higher color resolution compared to 4:2:0 but lower bandwidth requirements compared to 4:4:4.
As with YUV 4:2:0, semi-planar layouts are supported for 4:2:2 for 8-bit and 10-bit depths. The NVENCODE API introduces two new formats for YUV 4:2:2 semi-planar inputs:
NV_ENC_BUFFER_FORMAT_NV16
, for 8-bit YUV 4:2:2NV_ENC_BUFFER_FORMAT_P210
, for 10-bit YUV 4:2:2
chromaFormatIdc =2
enables 4:2:2 encoding. As with 4:2:0 and 4:4:4, an application can also pass ARGB input with chromaFormatIdc=2
to generate a 4:2:2 subsample encoded output. The color space conversion from ARGB to YUV 4:2:2 is done inside the encode driver using the CUDA kernel.
The compression efficiency for 4:2:2 input is similar to 4:2:0 and 4:4:4 formats.
Higher bit-depth encoding enhancement
NVIDIA Video Codec SDK 13.0 introduces 10-bit encoding support in H.264 on NVIDIA Blackwell GPUs. All the chroma subsampling formats—4:2:0, 4:2:2 and 4:4:4—support encoding 10-bit content.
In addition, NVIDIA Blackwell NVENC can encode 8-bit content as 10-bit for H.264 and HEVC, a feature already available for AV1 in ADA.
ADA and earlier GPUs continue to support this feature for HEVC, but unlike NVIDIA Blackwell, the input YUV is upscaled from 8- to 10-bit as a preprocessing step using CUDA.
This feature improves the coding efficiency due to higher precision in the encoding pipeline. This upgrade results in smoother gradations and more accurate color reproduction, ideal for high-quality video production. Many of the input-related calculations in the encoder are done in 10-bit instead of 8-bit. Applications can expect an improvement of around 3% in compression efficiency when using this feature without any significant impact on encoder performance.
Unlike AV1, 10-bit encoding is supported only on select profiles for H.264 and HEVC. Applications should only enable this feature if the decoder supports 10-bit profiles.
H.264 interlaced encoding
NVIDIA Blackwell GPUs bring back the support for encoding interlaced content, with improved flexibility for legacy video workflows for users working with traditional broadcast video formats.
Interlaced encoding involves splitting a frame into two fields. The first field includes the odd lines of the image while the second field includes the even lines. These fields are transmitted sequentially at a rapid rate creating an illusion of a single frame. Field encoding is supported for YUV 4:2:0 and YUV 4:2:2, 8- and 10-bit content on H.264.
NVENCODE API supports both top field first and bottom field first layouts for interlaced content.
Lookahead level and UHQ
Lookahead level analyzes future frames and enables efficient allocation of bits to different blocks within a frame based on how much a given block is referenced in next frames. The statistics generated during lookahead encoding are used for complexity estimation in rate control.
Video Codec SDK 13.0 adds the support for the lookahead level in AV1 and introduces AV1 UHQ tuning info, which combines lookahead level and temporal filtering to provide the best quality and performance for various HQ latency-tolerant encoding presets. These features provide best-in-class visual quality, perfect for demanding video applications
In the UHQ tuning, the optimal settings for lookahead and temporal filtering are applied in combination rather than individually. As with UHQ HEVC, the number of B-frames is set to five, while using the middle B-frame as the reference. UHQ mode also disables adaptive I- and B-frames and uses a fixed GOP (group of pictures) structure.
NVIDIA Blackwell NVENC supports new encode stats enabling the lookahead algorithm to identify the referencing details across multiple references, resulting in much better quality and performance of the UHQ tuning info for both HEVC and AV1 than ADA.
For AV1, the UHQ tuning info in NVIDIA Blackwell has seven B-frames, instead of five, further enhancing the compression efficiency.
Figure 1 and 2 shows the bitrates saving for ADA HQ, ADA UHQ, NVIDIA Blackwell HQ and NVIDIA Blackwell UHQ for HEVC and AV1, respectively.


Enhanced video decoding capabilities
NVIDIA Blackwell GPUs bring significant advancements to the world of video decoding, particularly in H.264 and HEVC formats. These enhancements offer both feature set and performance improvements, setting new standards in the industry.
H.264 decoding enhancements
With NVIDIA Blackwell, the H.264 decoding capabilities have undergone major improvements. Some of the key features include the following:
- 4:2:0 10-bit support (exclude MBAFF): This provides improved color depth, making it ideal for working in color-sensitive fields such as video production or gaming.
- 4:2:2 8/10-bit support (exclude MBAFF): With support for both 8-bit and 10-bit chroma sampling, NVIDIA Blackwell ensures high-quality video playback, even in compressed formats, which provides better visual fidelity.
- 2x performance improvement: Perhaps the most exciting update is the performance boost. NVIDIA Blackwell offers a 2x performance improvement compared to previous generations, which means smoother video playback and faster decoding even for the most demanding video files.
- Resolution support up to 8192×8192: Whether you’re working with ultra-high-definition video or cutting-edge 3D content, NVIDIA Blackwell has the ability to handle resolutions up to 8192×8192. This means that you can decode videos with greater clarity and detail.
These improvements ensure that NVIDIA Blackwell delivers top-tier video decoding, whether you’re working on high-resolution video projects or handling large-scale video processing tasks.
HEVC decoding for enhanced flexibility and speed
High-Efficiency Video Coding (HEVC) has become the go-to format for efficient video compression, and NVIDIA Blackwell takes it to the next level. The new enhancements include the following:
- Support for 4:2:2 8/10-bit and 12-bit decoding: NVIDIA Blackwell now offers a wider range of decoding options for HEVC, making it easier to process high-quality video with minimal loss of fidelity.
- Performance improvements: Thanks to improvements in the NVDCLK, you see noticeable performance boosts when decoding HEVC content. This translates into smoother playback and more efficient video rendering.

The NVDECODE API introduces two new formats for 4:2:2 decode output:
cudaVideoSurfaceFormat_NV16=4, /**< Semi-Planar YUV 422 [Y plane followed by interleaved UV plane] */
cudaVideoSurfaceFormat_P216=5 /**< 16 bit Semi-Planar YUV 422[Y plane followed by interleaved UV plane]*/
Dynamic decode surface allocation for GPU memory efficiency
One of the standout features in Video Codec SDK 13.0 is the introduction of dynamic decode surface allocation. By adapting to varying video bitstreams, this capability reduces unnecessary memory consumption, leading to a higher number of decode sessions. This improvement is crucial for optimizing GPU memory usage in a few video-decoding use cases.
In some cases, the bitstream may use fewer reference frames than what the DPB size suggests, wasting valuable video memory. Allocating the max decode surface results in higher memory foot prints.
Comparing the new SDK
Before this release, video applications created the decoder object with a fixed minimum number of surfaces based on the DPB size. This approach, while functional, sometimes allocated memory more than necessary.
CUVIDDECODECREATEINFO stDecodeCreateInfo;
memset(&stDecodeCreateInfo, 0x0, sizeof(CUVIDDECODECREATEINFO ));
. . . // Setup the remaining structure members
stDecodeCreateInfo.ulNumDecodeSurfaces = <dpb_size>// Prior to SDK 13.0, this could not change
rResult = cuvidCreateDecoder(&hDecoder, &stDecodeCreateInfo)
With Video Codec SDK 13.0, you gain the flexibility to allocate extra YUV surfaces only when needed. You can create a decoder object with a smaller initial allocation of YUV surfaces (such as 3 or 4) and use the cuvidReconfigureDecoder API to allocate more surfaces dynamically as needed. This dynamic allocation reduces unnecessary memory consumption and enhances the overall efficiency of the decoding process.
Unlocking new possibilities with Video Codec SDK 13.0
NVIDIA Video Codec SDK 13.0 pushes the boundaries of video encoding and decoding with the latest NVIDIA Blackwell GPUs. Whether you’re creating content for broadcast, editing high-quality video, or working with the latest 8K footage, this update offers the tools you need to elevate your workflows.
With improvements in compression efficiency, support for new color formats, enhanced encoding quality, and more, Video Codec SDK 13.0 is designed to meet the growing demands of modern video applications.
In support of the Video Codec SDK 13.0 launch, NVIDIA partners Blackmagic, Capcut, and Wondershare have already integrated features such as 4:2:2 encode, 4:2:2 decode, AV1 UHQ, and split encoding in their video pipelines.