NVIDIA is using CUDA and the power of GeForce to accelerate Adaptive Scalable Texture Compression (ASTC) along multiple axes!

Guest Blogger: Fei Yang

As a programmer, I love challenging jobs, and this is one them!

Adaptive Scalable Texture Compression (ASTC) is a new standard for texture compression that provides development teams and art directors with the ability to balance texture quality and texture size. Texture compression is not new, but ASTC provides a far greater number of options than older standards. It has a wide spectrum of bit-rate selections, from 8bits/texel (4x4 tiles) to 0.89bits/texel (12x12 tiles). For more information on ASTC generally and how to best use it for different texture assets, check out NVIDIA's ASTC usage guide.

To prevent the quality of the encoded texture from decreasing as sharply as the bit-rate decreases, ASTC allows the encoder to choose from a large set of configurations to accommodate each specific tile. For example, it introduces an "index-infill" process in texture decoding, so that the arrays that are actually storing the weights are not necessarily the same in size as the image tiles. These flexibilities, however, impose a heavy computational burden in the encoder as it must traverse the large configuration space, evaluating each of the candidates.

My colleagues and I have been working hard to address the performance issue by utilizing the computation power of NVIDIA GPUs and CUDA.

By dividing the compression task into small units and executing them in parallel, we achieve significant speed-up while preserving the same quality level as the reference CPU implementation. As you can see, parallelizing the compression decreases processing time by around three or four times. But there's more work to be done and more speed to be had if we can implement additional parallelism.

ASTC compression times, CUDA on GPU vs CPU

ASTC Compression times: CUDA on GPU vs CPU

Compression Quality Comparison

CPU 4x4 tiles

GPU 4x4 tiles

Original

CPU 8x8 tiles

GPU 8x8 tiles

Parallelism in Two Axes

When it comes to a CUDA implementation, we classify problems generally as "good ones" or "bad ones". As to this case, well, it's not a general case. That is, it's not an ideal problem for CUDA but it's not a bad one either. The problem contains two axes: one is data space, and the other is configuration space. As a first step, we have implemented data-parallelism, so different parts of the texture can be compressed in parallel. This saves a huge amount of time. The next step to further enhance performance is to exploit parallelism along the configuration axis as well.

ASTC Data Configuration Space Matrix

ASTC Data Configuration Space Matrix

NVASTC Alpha Out Now!

We still have a lot of work ahead of us to achieve two axis parallelism but we are progressing nicely. The current implementation with data-parallelism is a huge leap from the CPU based encoder and should assist developers in their testing and usage of ASTC. To that end, we have posted an alpha version of our encoder in the Download Center.

We are eager to hear your comments and feedback!