NVIDIA VIDEO CODEC SDK - Get Started
If you are looking to make use of the dedicated decoding/encoding hardware on your GPU in an existing application you can leverage the integration already available in FFmpeg. FFmpeg should be used for evaluation or quick integration, but it may not provide control over every encoder parameter. NVDECODE and NVENCODE APIs should be used for low-level granular control over various encode/decode parameters and if you want to directly tap into the hardware decoder/encoder. This access is available through the Video Codec SDK.
Cross-platform solution to record, convert, and stream audio and video. Includes NVIDIA Video Hardware Acceleration
- Hardware acceleration for most popular video framework
- Leverages FFmpeg’s Audio codec, stream muxing, and RTP protocols
- Available for Windows, Linux
- You can now use FFMPEG to accelerate video encoding and decoding using NVENC and NVDEC, respectively.
What's new in Video Codec SDK 11.1
- DirectX 12 support for H.264 and HEVC encode
- Chroma QP offset support for H.264 and HEVC Encode
- Single slice in frames during intra refresh for H.264 and HEVC Encode
System Requirements for Video Codec SDK 11.1
|Architecture||x64 and ppc64le|
|Operating System||Windows 7, 8, 10, Server 2008 R2, Server 2012, and Linux|
|Dependencies||NVENCODE API - NVIDIA Quadro, Tesla, GRID or GeForce products with Kepler, Maxwell, Pascal and Turing generation GPUs. |
NVDECODE API - NVIDIA Quadro, Tesla, GRID or GeForce products with Fermi, Kepler, Maxwell, Pascal and Turing generation GPUs.
GPU Support Matrix
NVIDIA Windows display driver 471.41 or newer
NVIDIA Linux display driver 470.57.02 or newer
DirectX SDK (Windows only) CUDA 11.0 Toolkit
|Development Environment||Windows: Visual Studio 2013/2015/2017/2019|
Linux: gcc 4.8 or higher
Documentation and Samples
Our forum community is where Developers can ask questions, share experiences and participate in discussions with NVIDIA and other experts in the field. Check out the forums here.
A: After you download the SDK, please refer to the "ReadMe.txt" which lists the minimum required display driver version. You need to install the right drivers or else the SDK will fail to start and this is the first thing you should check in case there is an initialization failure.
A: The support matrix is listed https://developer.nvidia.com/video-encode-decode-gpu-support-matrix. The client application should also query the capabilities using the respective capability APIs before enabling any feature.
A: We strongly recommend all application developers to go through the programming guides in detail before writing any application. In particular, for some hints on this question, please go through the following sections in the documentation. These sections provide valuable tips for optimizing latency/memory utilization and choosing the right settings for different use-cases.
- "Recommended NVENC Settings" in NVECODE API Programming Guide, included in the Video Codec SDK
- "Writing an Efficient Decode Application" in NVDECODE API Programming Guide, included in the Video Codec SDK
A: Video encoding latency consists of two components: (a) Latency due to encoding algorithm (e.g. B-frames, look-ahead, VBV buffering), and (b) Latency due to the processing required to encode the bits using hardware or software. For a typical end-to-end streaming scenario to incur low latency, it is important to lower both components as much as possible. Typically, latency in (a) can be minimized by choosing infinite GOP with IPPPP... structure, no look-ahead and lowest possible VBV buffer for the given bitrate and available channel bandwidth, without giving away too much of encoding quality. Each of these can all be set via NVENCODE API. In Video SDK 10.0 and above, setting the tuning info to low-latency or ultra-low-latency will set most of these parameters automatically. Latency contributed by (b) can be minimized by choosing the correct preset, and rate control mode with correct number of rate control passes. Naturally, 2-pass requires more time to encode than 1-pass rate control mode. Running quarter-resolution first pass requires less time than running both passes at full resolution.
In addition to the above, the overall encoding latency is also affected by efficient application design (or lack thereof). Since NVENC can run in parallel to CUDA and graphics workload, it is important to ensure that the NVENC pipeline is kept fed with data and the context switches between NVENC pre-processing (which uses small amount of CUDA bandwidth) and other graphics/CUDA workload are minimized. The specifics of this depend on the workload, but should be analyzed using a tool such as GPUView (available as a part of Windows Performance Toolkit).
A: The Video Codec SDK provides samples specifically designed to give optimal performance. Please refer to applications with suffix "…perf" inside the Video Codec SDK. User can run these applications for measuring maximum throughput. The samples in the Optical Flow SDK are optimized for performance.
A: First of all, it is important to note that the aggregate video encoding performance of GPUs is not solely defined by the raw number of NVENCs on the GPU silicon. As anyone familiar with video encoding will know, talking about video encoding performance without any reference to encoding quality is meaningless. For example, one can encode a video at blazing fast speed, without any regard to quality and claim extremely high performance, doubling the performance on GPUs with multiple NVENC engines. But such usage may not be of much use in practical situations. Therefore, it is important to think of encoding performance at a specific quality. NVIDIA encoding benchmarks use the bitrate savings compared with open source encoders x264 and x265's medium preset output, as a measure of the encoding quality. The performance vs. quality spectrum thus obtained is published for various generations of GPUs on Video Codec SDK web site. Most of the commonly used presets on Pascal have an equivalent preset in Turing with similar quality and 2x performance, thereby making it possible to get the same performance from both GPU generations, despite Turing GPUs having only 1 NVENC engine. This requires the application to choose appropriate encoding settings, depending upon the GPU in use. For low-latency presets and tuning, Turing NVENC provides equivalent settings to achieve higher performance per NVENC than Pascal NVENC for latency tolerant encoding. For latency-sensitive (low-latency) encoding, Turing NVENC does not provide 2x performance, but that's not needed because most of the low-latency scenarios are bottlenecked by the graphics/CUDA utilization and not NVENC utilization.
In short, despite the reduction of number of NVENCs from Pascal to Turing, one should be able to achieve equivalent encoding performance per GPU, in most practical use cases by adjusting the encoding settings to normalize the encoding quality.
A: NVENCODE API expose APIs which allow users to query the maximum API versions supported by the underlying driver. Depending on the maximum API version supported by driver, the application can launch code at runtime compiled with the appropriate API.
A: For decoder, please refer to the NVDEC application note included in the SDK documentation to get an idea about performance. For encoder, the answer depends on many factors, some of which include: GPU in use and its clock speed, settings used for encoding (i.e. encode quality), memory bandwidth available, application design. It is especially important to note that GPU encoding performance is always tied to the encoding quality, and the performance can vary greatly depending upon the chosen settings. For example, B-frames, 2-pass rate control mode, or look-ahead will improve the encoding quality at the cost of performance. Encoding presets also influence quality vs performance trade-off significantly. Please refer to the table containing indicative performance figures for the video encoder in NVENC application note included in the SDK package.
A: Create separate Cuda streams for encode and decode. For NVDECODEAPI and NVENCODEAPI you can specify the stream where you want to Cuda kernels using CUVIDPROCPARAMS::output_stream and NvEncSetIOCudaStreams(..) respectively.
- Download older legacy versions of NVENC SDK and Video Codec SDK
- Download CUDA Toolkit
- Download FFmpeg
- Download Video Test Sources (YUV RAW 1080p Files - Heavy Hand video input)
- Blog - Optimizing Video Memory Usage with NVDECODE API and NVIDIA Video Codec SDK
- Blog - Turing H.264 Video Encoding Speed and Quality
- Blog - New GeForce-Optimized OBS and RTX Encoder Enables Pro-Quality Broadcasting on a Single PC
- GitHub Streamline live streaming system reference design