Support for accelerated hardware video encoding began with the Kepler generation of NVIDIA GPUs, and all GPUs since the Fermi generation support hardware video acceleration decoding through the NVIDIA Video Codec SDK.
While showing great performance and flexibility, it requires knowledge of C/C++. Another option is to use third party libraries and applications like FFmpeg or GStreamer which again require C/C++ expertise to be built-in and customized per user.
However, hardware accelerated video features might be useful for a broader audience, and the intent of VPF (Video Processing Framework) is a simple, yet powerful tool for utilizing NVIDIA GPUs when working with video using Python. VPF utilizes the NVIDIA Video Codec SDK for flexibility and performance, and provides developers with the ease-of-use inherent to Python.
VPF is a set of C++ libraries and Python bindings which provides full hardware acceleration for video processing tasks such as decoding, encoding, transcoding and GPU-accelerated color space and pixel format conversions. VPF is a CMake-based open source cross-platform software released under Apache 2 license. It relies on FFmpeg library for (de)muxing and pybind11 project for building Python bindings.
VPF exports C++ video processing classes into PyNvCodec
Python module. To illustrate the ease of use, let’s start with a quick code snippet which shows how to do fully hardware-accelerated video transcoding on a GPU without raw frames copying between Host and Device:
import PyNvCodec as nvc gpuID = 0 encFile = "big_buck_bunny_1080p_h264.mov" xcodeFile = open("big_buck_bunny_1080p.h264", "wb") nvDec = nvc.PyNvDecoder(encFile, gpuID) nvEnc = nvc.PyNvEncoder({'preset': 'hq', 'codec': 'h264', 's': '1920x1080'}, gpuID) while True: rawSurface = nvDec.DecodeSingleSurface() # Decoder will return zero surface if input file is over; if not (rawSurface.GetCudaDevicePtr()): break encFrame = nvEnc.EncodeSingleSurface(rawSurface) if(encFrame.size): frameByteArray = bytearray(encFrame) xcodeFile.write(frameByteArray) # Encoder is asynchronous, so we need to flush it encFrames = nvEnc.Flush() for encFrame in encFrames: encByteArray = bytearray(encFrame) xcodeFile.write(encByteArray)
Despite a simple design, VPF demonstrates good performance. The transcoding sample shown above is enough to saturate Nvenc unit on an RTX 5000 GPU as illustrated below:
Big Buck Bunny sequence contains 14,315 frames and can be transcoded within 32 seconds which gives ~447fps without using any advanced techniques such as producer-consumer pattern with decoded frames queue being shared by decoder and encoder launched in separate threads. Since all the transcoding is done on the GPU, there’s no noticeable CPU load.
The core part of VPF are PyNvDecoder
and PyNvEncoder
classes which are Python bindings to NVIDIA Video Codec SDK. There are two major data types which VPF operates:
- NumPy arrays for CPU-side data
- User-transparent
Surface
class which represents GPU-side data
Since GPU-side memory objects allocation is complex and influences performance heavily, all VPF classes methods which return Surface
, own them and may reuse previously returned Surface
upon next call. Unlike that, VPF classes methods return new NumPy array instance every time they are called. Move constructors are used for that to avoid memory copy overheads.
Both PyNvDecoder
and PyNvEncoder
classes support only NV12 pixel format for the sake of simplicity. Other pixel formats are supported with set of color space and pixel format conversion classes. All conversions are GPU-accelerated and done in VRAM memory for better performance.
PyNvDecoder
class has five main methods:
|
Decodes single frame from input video, returns Surface with decoded pixels. The next time a user calls this method, previously returned Surface may be reused. If the frame isn’t decoded, decoded Surface’s |
|
Decodes single frame from input video, returns NumPy array with decoded pixels. Next time user calls this method, another NumPy array instance will be returned. If frame isn’t decoded, it will return empty NumPy array. This operation does Device to Host memory copy. |
|
Returns decoded frame width. |
|
Returns decoded frame height. |
|
Returns decoded frame pixel format. |
User may mix DecodeSingleSurface
and DecodeSingleFrame
calls, it will not break decoder internal state. Decoder class supports H.264 and H.265 codecs.
PyNvEncoder
class has six methods:
|
Takes NV12 Surface with raw pixels, encodes it and returns elementary video bitstream as NumPy array. The encoder is asynchronous, so this method may return empty array upon the first few calls (depending on encoder settings), which is not an error. |
|
Takes NumPy array with raw pixels, encodes it and returns elementary video bitstream as NumPy array. The encoder is asynchronous, so this method may return empty array upon first few calls (depending on encoder settings), which is not an error. |
|
Flushes the encoder. It does not return unless all raw frames in encoder’s queue are encoded and returns a list of NumPy arrays with elementary stream bytes. |
|
Returns encoded frame width. |
|
Returns encoded frame height. |
|
Returns encoded frame pixel format. |
If a user mixes EncodeSingleSurface
and EncodeSingleFrame
calls, it will not break the encoder internal state. Also, PyNvEncoder
can take an input frame of arbitrary resolution and resize it on the GPU on-the-fly before the actual encoding. The encoder class supports H.264 and H.265 codecs, and has low latency, so in the end of the encoding session, one should call Flush
method that will flush the encoder frame queue.
Below is a list of supported encoder parameters:
Parameter |
Type |
Meaning |
|
string |
Encoding profile. Possible values for h264: Possible values for hevc: |
|
integer |
Size of look-ahead. |
|
integer + unit |
VBV initial delay in bits, can be in unit of 1, K, M. |
|
integer + unit |
Average bit rate, can be in unit of 1, K, M. |
|
integer |
Frame rate. |
|
string |
Encoding preset. Possible values: |
|
integer |
QP value for constqp rate control mode. |
|
integer |
Min QP value. |
|
integer |
Max QP value. |
|
integer |
Target constant quality level for VBR mode Possible values: |
|
integer |
Initial QP value. |
|
(No value) Enable temporal AQ. |
|
|
integer + unit |
VBV buffer size in bits, can be in unit of 1, K, M. |
|
integer |
Number of consecutive B-frames. |
|
string |
Rate control mode. Possible values: |
|
integer |
Enable spatial AQ and set its strength Possible values: |
|
integer + unit |
Max bit rate, can be in unit of 1, K, M. |
|
integer |
Length of GOP (Group of Pictures). |
|
string |
Video codec. Possible values: |
HardwareSurface
class is a wrapper around CUdeviceptr
:
|
Returns CUdeviceptr handle to CUDA memory object. |
For memory transfers between Host and Device, there are two classes named PyFrameUploader
and PySurfaceDownloader
.
PyFrameUploader
is used for uploading a NumPy array to GPU. It has only one method:
|
Uploads a numpy array to GPU, returns handle to uploaded Surface. Next time user calls this method, previously returned Surface may be reused. |
PySurfaceDownloader
is used to download a Surface from GPU. It also has only one method:
|
Downloads a GPU-side Surface into CPU-side numpy array. Next time user calls this method, another numpy array instance will be returned. |
Finally, there’s a PySurfaceConverter
class which is used for GPU-accelerated color space and pixel format conversion. Below is the list of supported conversions:
- YUV420 to NV12
- NV12 to YUV420
- NV12 to RGB
PySurfaceConverter
has one method:
|
Performs conversion on the GPU, returns handle to Surface with output format. The next time a user calls this method, previously returned Surface may be reused. |
VPF provides developers with a simple, yet powerful Python tool for fully hardware -accelerated video encoding, decoding and processing classes. Thanks to the C++ code underneath the Python bindings, it allows you to achieve high GPU utilization within tens of code lines. Decoded video frames are exposed either as NumPy arrays or CUDA device pointers for simpler interaction and features extension. VPF does not impose any restrictions above the NVIDIA Video Codec SDK and allows you to fully utilize the potential of NVIDIA professional-grade GPUs.