NVIDIA Performance Primitives

Image and Signal Processing on GPUs



The NVIDIA Performance Primitives (NPP) library provides GPU-accelerated image, video, and signal processing functions that perform up to 30x faster than CPU-only implementations. With over 5,000 primitives for image and signal processing, you can easily perform tasks such as color conversion, image compression, filtering, thresholding and image manipulation.



Using the NPP library, engineers, scientists and researchers working on image processing and signal processing in a range of domains such as computer vision, industrial inspection, robotics, medical imaging, telecommunications, deep learning, and high performance computing and more can quickly bring up applications needing high performance low level image or signal processing functionality through simple function calls.



Download Now




Euclidean distance transort (EDT)

NPP example: Euclidean Distance Transform (EDT)





Performance at Any Scale

The NPP library optimizes the use of available computing resources so that your application achieves maximum performance across data center, workstation and embedded platforms.

Simple Setup

Ready-to-use, domain-specific, high performance primitives feature a rich set of functions supporting a large variety of image formats. Drop-in replacement for the Intel Integrated Performance Primitives (IPP) CPU library.

Designed for Flexibility

Use as a stand-alone library to add GPU acceleration to your application in a matter of hours, or as a cooperative library for interoperating efficiently with your existing GPU code. Includes both low level primitives and high level capabilities.




Comparative Performance

Image alt text
Test Setup
IPP 2018 running on an Intel Xeon Gold 6240@2GHz 3.9GHz Turbo (Cascade Lake) server with HT on; Ubuntu18.04 OS
GPU — Tesla T4(TU104) 1*16097 MiB 1*40 SM
Tesla V100-SXM2-32GB(GV100) 1*32510 MiB 1*80 SM
A100-SXM4-40GB(GA100) 1*40557 MiB 1*108 SM
CUDA Driver — 445.33 (r445_00), CUDA Toolkit 11.0
Speedup represents average bandwidth increase over all routines

Key Features

  • Accepts raw uncompressed image or signal data
  • Supports multiple RGB and YUV image and video formats
  • Use ColorTwist functions to work in derived color spaces, including YCoCg (H.265) and PCA
  • Handles high fidelity 10-bit or 12-bit HDR video (i.e. cooled sensor astrophotography)
  • Avoids boundary effects; operates on width, height pair Regions of Interest (ROIs)
  • Alpha channel support
  • Single, three (RGB), or four channel (RGBA image formats
  • Supports 8u, 16s/16u, 32f image bit depths

Resources