NVIDIA VIDEO CODEC SDK - Get Started

If you are looking to make use of the dedicated decoding/encoding hardware on your GPU in an existing application you can leverage the integration already available in FFmpeg. FFmpeg should be used for evaluation or quick integration, but it may not provide control over every encoder parameter. NVDECODE and NVENCODE APIs should be used for low-level granular control over various encode/decode parameters and if you want to directly tap into the hardware decoder/encoder. This access is available through the Video Codec SDK.

Video Codec for application developers

Video Codec SDK 13.0 Header Files

FFMPEG is a cross-platforms solution to record, convert, and stream audio and video. FFMPEG supports video hardware acceleration on NVIDIA GPUs.

Hardware acceleration for most popular video framework.
Leverages FFmpeg’s Audio codec, stream muxing, and RTP protocols.
Available for Windows, Linux.
You can now use FFMPEG to accelerate video encoding and decoding using NVENC and NVDEC, respectively.

Learn more about FFMPEG

What’s new in Video Codec SDK 13.0

Blackwell features:

422, 422i, 420i 8b/10b H.264 encode, decode support.
422 8b/10b HEVC encode, decode support.
Multi-view HEVC (ME-HEVC) 420 8b/10b support.
Ultra-high quality (UHQ) mode for AV1.
Double H.264 decode throughput per NVDEC compared to previous generation.
Decode memory optimizations.

Click here for additional information.

System Requirements for Video Codec SDK 13.0

Operating System	Windows 10 and 11, Server 2008 R2, Server 2012, and Linux
Dependencies	NVIDIA Quadro, Tesla, GRID or GeForce products. GPU Support Matrix NVIDIA Windows display driver 570.0 or newer NVIDIA Linux display driver 570.0 or newer Get the most Recent NVIDIA Display Driver DirectX SDK (Windows only) CUDA Toolkit

Documentation and Samples

Online Documentation

For convenience, NVDECODE API documentation and sample applications are also included in the CUDA Toolkit, in addition to the Video Codec SDK download package.

Note: For Video Codec SDK 7.0 and later, NVCUVID has been renamed to NVDECODE API.

Developer Forums

Our forum community is where Developers can ask questions, share experiences and participate in discussions with NVIDIA and other experts in the field. Check out the forums here.

FAQ

Q: How do I choose the right driver version to install?

A: After you download the SDK, please refer to the "ReadMe.txt" which lists the minimum required display driver version. You need to install the right drivers or else the SDK will fail to start and this is the first thing you should check in case there is an initialization failure.

Q: Is there a way to know if the feature is supported on the GPU I have?

A: The support matrix is listed https://developer.nvidia.com/video-encode-decode-gpu-support-matrix. The client application should also query the capabilities using the respective capability APIs before enabling any feature.

Q: How do I write an efficient application and what are the right settings I should use?

A: We strongly recommend all application developers to go through the programming guides in detail before writing any application. In particular, for some hints on this question, please go through the following sections in the documentation. These sections provide valuable tips for optimizing latency/memory utilization and choosing the right settings for different use-cases.

"Recommended NVENC Settings" in NVECODE API Programming Guide, included in the Video Codec SDK
"Writing an Efficient Decode Application" in NVDECODE API Programming Guide, included in the Video Codec SDK

Q: How to encode at low latency using NVENC?

A: Video encoding latency consists of two components: (a) Latency due to encoding algorithm (e.g. B-frames, look-ahead, VBV buffering), and (b) Latency due to the processing required to encode the bits using hardware or software. For a typical end-to-end streaming scenario to incur low latency, it is important to lower both components as much as possible. Typically, latency in (a) can be minimized by choosing infinite GOP with IPPPP... structure, no look-ahead and lowest possible VBV buffer for the given bitrate and available channel bandwidth, without giving away too much of encoding quality. Each of these can all be set via NVENCODE API. In Video SDK 10.0 and above, setting the tuning info to low-latency or ultra-low-latency will set most of these parameters automatically. Latency contributed by (b) can be minimized by choosing the correct preset, and rate control mode with correct number of rate control passes. Naturally, 2-pass requires more time to encode than 1-pass rate control mode. Running quarter-resolution first pass requires less time than running both passes at full resolution.

In addition to the above, the overall encoding latency is also affected by efficient application design (or lack thereof). Since NVENC can run in parallel to CUDA and graphics workload, it is important to ensure that the NVENC pipeline is kept fed with data and the context switches between NVENC pre-processing (which uses small amount of CUDA bandwidth) and other graphics/CUDA workload are minimized. The specifics of this depend on the workload, but should be analyzed using a tool such as GPUView (available as a part of Windows Performance Toolkit).

Q: How do I measure performance?

A: The Video Codec SDK provides samples specifically designed to give optimal performance. Please refer to applications with suffix "…perf" inside the Video Codec SDK. User can run these applications for measuring maximum throughput. The samples in the Optical Flow SDK are optimized for performance.

Q: All of NVIDIA's Turing generation GPUs contain 1 NVENC engine whereas Pascal and some early generation GPUs contain multiple NVENC engines. Why did NVIDIA choose to reduce the number of NVENC engines and regress the encoding performance per GPU?

A: First of all, it is important to note that the aggregate video encoding performance of GPUs is not solely defined by the raw number of NVENCs on the GPU silicon. As anyone familiar with video encoding will know, talking about video encoding performance without any reference to encoding quality is meaningless. For example, one can encode a video at blazing fast speed, without any regard to quality and claim extremely high performance, doubling the performance on GPUs with multiple NVENC engines. But such usage may not be of much use in practical situations. Therefore, it is important to think of encoding performance at a specific quality. NVIDIA encoding benchmarks use the bitrate savings compared with open source encoders x264 and x265's medium preset output, as a measure of the encoding quality. The performance vs. quality spectrum thus obtained is published for various generations of GPUs on Video Codec SDK web site. Most of the commonly used presets on Pascal have an equivalent preset in Turing with similar quality and 2x performance, thereby making it possible to get the same performance from both GPU generations, despite Turing GPUs having only 1 NVENC engine. This requires the application to choose appropriate encoding settings, depending upon the GPU in use. For low-latency presets and tuning, Turing NVENC provides equivalent settings to achieve higher performance per NVENC than Pascal NVENC for latency tolerant encoding. For latency-sensitive (low-latency) encoding, Turing NVENC does not provide 2x performance, but that's not needed because most of the low-latency scenarios are bottlenecked by the graphics/CUDA utilization and not NVENC utilization.

In short, despite the reduction of number of NVENCs from Pascal to Turing, one should be able to achieve equivalent encoding performance per GPU, in most practical use cases by adjusting the encoding settings to normalize the encoding quality.

Q: Since every Video Codec SDK needs to have a minimum display driver version installed on the system, how can I write an application which will work across devices having different display driver versions?

A: NVENCODE API expose APIs which allow users to query the maximum API versions supported by the underlying driver. Depending on the maximum API version supported by driver, the application can launch code at runtime compiled with the appropriate API.

Q: How do I find out how many streams of a specific resolution and frame rate (e.g. 1080p30) can a given GPU encode or decode?

A: For decoder, please refer to the NVDEC application note included in the SDK documentation to get an idea about performance. For encoder, the answer depends on many factors, some of which include: GPU in use and its clock speed, settings used for encoding (i.e. encode quality), memory bandwidth available, application design. It is especially important to note that GPU encoding performance is always tied to the encoding quality, and the performance can vary greatly depending upon the chosen settings. For example, B-frames, 2-pass rate control mode, or look-ahead will improve the encoding quality at the cost of performance. Encoding presets also influence quality vs performance trade-off significantly. Please refer to the table containing indicative performance figures for the video encoder in NVENC application note included in the SDK package.

Q: The NVDECODEAPI and NVENCODEAPI use Cuda for certain helper functionalities during the encode and decode. How do I ensure that the Cuda operations in NVDECODEAPI and NVENCODEAPI are made to run a separate stream so that the Cuda workload is not blocked due to the Cuda workload inside NVDECODEAPI and/or NVENCODEAPI?

A: Create separate Cuda streams for encode and decode. For NVDECODEAPI and NVENCODEAPI you can specify the stream where you want to Cuda kernels using CUVIDPROCPARAMS::output_stream and NvEncSetIOCudaStreams(..) respectively.

Additional Resources

Download older legacy versions of NVENC SDK and Video Codec SDK
Download CUDA Toolkit
Download FFmpeg
Download Video Test Sources (YUV RAW 1080p Files - Heavy Hand video input)

Blog - Optimizing Video Memory Usage with NVDECODE API and NVIDIA Video Codec SDK
Blog - Turing H.264 Video Encoding Speed and Quality
Blog - New GeForce-Optimized OBS and RTX Encoder Enables Pro-Quality Broadcasting on a Single PC

GitHub Streamline live streaming system reference design