After clicking “Watch Now” you will be prompted to login or join.
Software-Based Compression for Analytical Workloads
Nikolay Sakharnykh, NVIDIA | Rene Mueller, NVIDIA
GTC 2020
Real-world analytical pipelines have very large memory requirements and stress the CPU-GPU and GPU-GPU interconnects. The GPU memory size is limited, and the data is often offloaded to CPU memory for further processing on the GPU later. That can present a significant bottleneck for the end-to-end pipeline. Fast compression and decompression can improve performance by reducing the amount of data to be sent over the interconnect, or even completely eliminate the need to offload data by storing it in GPU memory in compressed format and performing subsequent operations on the compressed data. We'll survey various parallel compression algorithms, from LZ-based to run-length encoding, dictionary, and bit-packing. We'll discuss efficient GPU implementations and integration in data-science frameworks such as RAPIDS, and also highlight compression mechanisms in hardware. Our best approaches can achieve a 50x compression ratio and maintain 50 GB per second compression/decompression speed on Tesla T4.