High Speed Data Compression Using NVIDIA GPUs


The nvCOMP library provides fast lossless data compression and decompression using a GPU. It features generic compression interfaces to enable developers to use high-performance GPU compressors in their applications.

The nvCOMP library provides various easy-to-use parallel methods to compress and decompress buffers on the GPU. It also includes API actions for the efficient compression and decompression of data on the GPU.

The nvCOMP library is designed to be modular with ability to add new implementations without changing the high-level user interface. We’re working on additional schemes, and also on a "how to" guide for developers to add their own custom algorithms.

The subset of nvCOMP library with basic compression/decompression schemes is available as an open-source project on GitHub. The full library is available here under the standard NVIDIA Software License Agreement.

Explore what’s new in the latest release…

nvCOMP Key Features

  • Support for Cascaded, LZ4, Snappy, GDeflate, bitcomp compression and decompression
    • Cascaded, LZ4 and Snappy are released as OSS on GitHub: see examples, benchmarks, and README for more details of the formats and GPU implementations
    • Gdeflate and bitcomp are provided as binary-only compressors. Download the package (tar.gz) for your OS, and then follow instructions in the README provided in the tarball on how to install and use these compressors. Examples are available on GitHub.
  • Flexible APIs
    • Low-level is targeting advanced users — metadata and chunking must be handled outside of nvCOMP, low-level nvCOMP APIs perform batch compression/decompression of multiple streams, they are light-weight and fully asynchronous.
    • High-level is provided for ease of use — metadata and chunking is handled internally by nvCOMP, this enables the easiest way to ramp up and use nvCOMP in applications, some of the high-level APIs are synchronous and for best performance/flexibility it’s recommended to use the low-level APIs.

nvCOMP Performance

The nvCOMP library includes Cascaded, Snappy and LZ4 compression methods. Cascaded compression methods demonstrate high performance with up to 500 GB/s throughput and a high compression ratio of over 60x on numerical data from analytical workloads. Snappy and LZ4 methods can achieve over 100 GB/s compression and decompression throughput depending on the dataset, and show good compression ratios for arbitrary byte streams.

Compression ratio and performance of compression/decompression on the GPU for three methods available in nvCOMP (cascaded, Snappy and LZ4). Each column shows results for a single column from an analytical dataset derived from Fannie Mae’s Single-Family Loan Performance Data.

For additional information, please see the Optimizing Data Transfer Using Lossless Compression with nvCOMP developer blog and the Optimizing Lossless Compression algorithms on the GPU GTC’21 talk for the latest performance numbers. Also feel free to to gather your own results by running the benchmarks from the nvCOMP GitHub page.