A defining feature of the new Volta GPU Architecture is its Tensor Cores, which give the Tesla V100 accelerator a peak throughput 12 times the 32-bit floating point throughput of the previous-generation Tesla P100.
In this post we introduce the “register cache”, an optimization technique that develops a virtual caching layer for threads in a single warp. It is a software abstraction implemented on top of the NVIDIA GPU shuffle primitive.
Researchers from Microsoft and Hong Kong University of Science and Technology developed a deep learning method that can transfer the style and color from multiple reference images onto another photograph.
One of my favorite things is getting to talk to people about GPU computing and Python. The productivity and interactivity of Python combined with the high performance of GPUs is a killer combination for many problems in science and engineering.