NVIDIA VIDEO CODEC SDK - Get Started
If you are looking to make use of the dedicated decoding/encoding hardware on your GPU in an existing application you can leverage the integration already available in FFmpeg. FFmpeg should be used for evaluation or quick integration, but it may not provide control over every encoder parameter. NVDECODE and NVENCODE APIs should be used for low-level granular control over various encode/decode parameters and if you want to directly tap into the hardware decoder/encoder. This access is available through the Video Codec SDK.
Video Codec SDK 12.1 Header Files
FFMPEG is a cross-platforms solution to record, convert, and stream audio and video. FFMPEG supports video hardware acceleration on NVIDIA GPUs.
- Hardware acceleration for most popular video framework.
- Leverages FFmpeg’s Audio codec, stream muxing, and RTP protocols.
- Available for Windows, Linux.
- You can now use FFMPEG to accelerate video encoding and decoding using NVENC and NVDEC, respectively.
What’s new in Video Codec SDK 12.1
- NVENC Low Level API for H.264, HEVC and AV1 encoders:
- Iterative Encoding: Encoding the same frame multiple times with different QP values and without advancing encoder state.
- ReCon: provide access to NVENC reconstructed frame.
- NVENC Low Level Stats: encoded frame output stats at row and block level.
- External lookahead: API to invoke lookahead explicitly.
- CABR (Content Adaptive Bit Rate) acceleration using NVENC Low Level API
- Explicit Split Frame encoding: enable/disable split frame encoding or use default automatic mode.
- Sample application to demonstrate accelerated file compression (AppEncMultiInstance) by using multiple NVENCs.
- Click here for additional information.
System Requirements for Video Codec SDK 12.1
|Operating System||Windows 10 and 11, Server 2008 R2, Server 2012, and Linux|
|Dependencies||NVIDIA Quadro, Tesla, GRID or GeForce products.
GPU Support Matrix
NVIDIA Windows display driver 531.61 or newer
NVIDIA Linux display driver 530.41.03 or newer Get the most Recent NVIDIA Display Driver
DirectX SDK (Windows only) CUDA Toolkit
Documentation and Samples
Our forum community is where Developers can ask questions, share experiences and participate in discussions with NVIDIA and other experts in the field. Check out the forums here.
A: After you download the SDK, please refer to the "ReadMe.txt" which lists the minimum required display driver version. You need to install the right drivers or else the SDK will fail to start and this is the first thing you should check in case there is an initialization failure.
A: The support matrix is listed https://developer.nvidia.com/video-encode-decode-gpu-support-matrix. The client application should also query the capabilities using the respective capability APIs before enabling any feature.
A: We strongly recommend all application developers to go through the programming guides in detail before writing any application. In particular, for some hints on this question, please go through the following sections in the documentation. These sections provide valuable tips for optimizing latency/memory utilization and choosing the right settings for different use-cases.
- "Recommended NVENC Settings" in NVECODE API Programming Guide, included in the Video Codec SDK
- "Writing an Efficient Decode Application" in NVDECODE API Programming Guide, included in the Video Codec SDK
A: Video encoding latency consists of two components: (a) Latency due to encoding algorithm (e.g. B-frames, look-ahead, VBV buffering), and (b) Latency due to the processing required to encode the bits using hardware or software. For a typical end-to-end streaming scenario to incur low latency, it is important to lower both components as much as possible. Typically, latency in (a) can be minimized by choosing infinite GOP with IPPPP... structure, no look-ahead and lowest possible VBV buffer for the given bitrate and available channel bandwidth, without giving away too much of encoding quality. Each of these can all be set via NVENCODE API. In Video SDK 10.0 and above, setting the tuning info to low-latency or ultra-low-latency will set most of these parameters automatically. Latency contributed by (b) can be minimized by choosing the correct preset, and rate control mode with correct number of rate control passes. Naturally, 2-pass requires more time to encode than 1-pass rate control mode. Running quarter-resolution first pass requires less time than running both passes at full resolution.
In addition to the above, the overall encoding latency is also affected by efficient application design (or lack thereof). Since NVENC can run in parallel to CUDA and graphics workload, it is important to ensure that the NVENC pipeline is kept fed with data and the context switches between NVENC pre-processing (which uses small amount of CUDA bandwidth) and other graphics/CUDA workload are minimized. The specifics of this depend on the workload, but should be analyzed using a tool such as GPUView (available as a part of Windows Performance Toolkit).
A: The Video Codec SDK provides samples specifically designed to give optimal performance. Please refer to applications with suffix "…perf" inside the Video Codec SDK. User can run these applications for measuring maximum throughput. The samples in the Optical Flow SDK are optimized for performance.
A: First of all, it is important to note that the aggregate video encoding performance of GPUs is not solely defined by the raw number of NVENCs on the GPU silicon. As anyone familiar with video encoding will know, talking about video encoding performance without any reference to encoding quality is meaningless. For example, one can encode a video at blazing fast speed, without any regard to quality and claim extremely high performance, doubling the performance on GPUs with multiple NVENC engines. But such usage may not be of much use in practical situations. Therefore, it is important to think of encoding performance at a specific quality. NVIDIA encoding benchmarks use the bitrate savings compared with open source encoders x264 and x265's medium preset output, as a measure of the encoding quality. The performance vs. quality spectrum thus obtained is published for various generations of GPUs on Video Codec SDK web site. Most of the commonly used presets on Pascal have an equivalent preset in Turing with similar quality and 2x performance, thereby making it possible to get the same performance from both GPU generations, despite Turing GPUs having only 1 NVENC engine. This requires the application to choose appropriate encoding settings, depending upon the GPU in use. For low-latency presets and tuning, Turing NVENC provides equivalent settings to achieve higher performance per NVENC than Pascal NVENC for latency tolerant encoding. For latency-sensitive (low-latency) encoding, Turing NVENC does not provide 2x performance, but that's not needed because most of the low-latency scenarios are bottlenecked by the graphics/CUDA utilization and not NVENC utilization.
In short, despite the reduction of number of NVENCs from Pascal to Turing, one should be able to achieve equivalent encoding performance per GPU, in most practical use cases by adjusting the encoding settings to normalize the encoding quality.
A: NVENCODE API expose APIs which allow users to query the maximum API versions supported by the underlying driver. Depending on the maximum API version supported by driver, the application can launch code at runtime compiled with the appropriate API.
A: For decoder, please refer to the NVDEC application note included in the SDK documentation to get an idea about performance. For encoder, the answer depends on many factors, some of which include: GPU in use and its clock speed, settings used for encoding (i.e. encode quality), memory bandwidth available, application design. It is especially important to note that GPU encoding performance is always tied to the encoding quality, and the performance can vary greatly depending upon the chosen settings. For example, B-frames, 2-pass rate control mode, or look-ahead will improve the encoding quality at the cost of performance. Encoding presets also influence quality vs performance trade-off significantly. Please refer to the table containing indicative performance figures for the video encoder in NVENC application note included in the SDK package.
A: Create separate Cuda streams for encode and decode. For NVDECODEAPI and NVENCODEAPI you can specify the stream where you want to Cuda kernels using CUVIDPROCPARAMS::output_stream and NvEncSetIOCudaStreams(..) respectively.
- Download older legacy versions of NVENC SDK and Video Codec SDK
- Download CUDA Toolkit
- Download FFmpeg
- Download Video Test Sources (YUV RAW 1080p Files - Heavy Hand video input)
- Blog - Optimizing Video Memory Usage with NVDECODE API and NVIDIA Video Codec SDK
- Blog - Turing H.264 Video Encoding Speed and Quality
- Blog - New GeForce-Optimized OBS and RTX Encoder Enables Pro-Quality Broadcasting on a Single PC
- GitHub Streamline live streaming system reference design