NVIDIA Performance Primitives

The NVIDIA Performance Primitives library (NPP) is a collection of GPU-accelerated image, video, and signal processing functions that deliver 5x to 10x faster performance than comparable CPU-only implementations. Using NPP, developers can take advantage of over 1900 image processing and  approx 600 signal processing primitives to achieve significant improvements in application performance in a matter of hours.

Whether you are simply replacing CPU primitives with GPU-accelerated versions or integrating NPP primitives with your existing GPU-accelerated pipeline, NPP delivers high performance while reducing your development time.



GPU-Accelerated “GrabCut” example using NPP graphcut primitive


Review the latest CUDA performance report to learn how much you could accelerate your code.

It’s easy to build GPU-accelerated signal processing applications using NPP primitives

Key Features


  • Eliminates unnecessary copying of data to/from CPU memory
    • Process data that is already in GPU memory
    • Leave results in GPU memory so they are ready for subsequent processing
  • Data Exchange and Initialization
    • Set, Convert, Copy, CopyConstBorder, Transpose, SwapChannels
  • Arithmetic and Logical Operations
    • Add, Sub, Mul, Div, AbsDiff, Threshold, Compare
  • Color Conversion
    • RGBToYCbCr, YcbCrToRGB, YCbCrToYCbCr, ColorTwist, LUT_Linear
  • Filter Functions
    • FilterBox, Filter, FilterRow, FilterColumn, FilterMax, FilterMin, Dilate, Erode, SumWindowColumn, SumWindowRow
  • JPEG
    • DCTQuantInv, DCTQuantFwd, QuantizationTableJPEG
  • Geometry Transforms
    • Mirror, WarpAffine, WarpAffineBack, WarpAffineQuad, WarpPerspective, WarpPerspectiveBack  , WarpPerspectiveQuad, Resize
  • Statistics Functions
    • Mean_StdDev, NormDiff, Sum, MinMax, HistogramEven, RectStdDev


The NVIDIA Performance Primitives library is freely available as part of the CUDA Toolkit at www.nvidia.com/getcuda.
For more information on NPP and other GPU-accelerated libraries: