Computer Vision / Video Analytics

Researchers at VideoGorillas Use AI to Remaster Archived Content to 4K Resolution and Above

Over the past few years, film and video standards have continued to evolve. There is a growing demand for higher fidelity imagery and resolutions to deliver a more immersive viewing experience. 

With 4K as the current standard and 8K experiences becoming the new norm, older content doesn’t meet today’s visual standard. The remastering process aims to revitalize older content to match these new standards. It has become a common practice in the industry, allowing audiences to revisit older favorites and enjoy them in a modern viewing experience. 

As resolution increases, it becomes more difficult to remaster content without major artifacts. Studios have to apply additional resources and manage longer lead times to produce reasonable quality content. 

To meet the growing pace of innovation, one company is developing a new AI-enhanced solution to exceed visual expectations at lower costs. 

Los Angeles-based VideoGorillas develops state-of-the-art media technology that incorporates AI techniques built on NVIDIA CUDA-X  and Studio Stack. By integrating GPU-accelerated machine learning, deep learning, and computer vision, their techniques allow studios to achieve higher visual fidelity and increased productivity when it comes to remastering.

A recent innovation they’re developing is a new production-assisted AI technique called Bigfoot super resolution. This technique converts films from native 480p to 4K by using neural networks to predict missing pixels that are incredibly high quality, so the original content almost appears as it was filmed in 4K. 

“Bigfoot Super Resolution is an entirely new approach to upscaling powered by NVIDIA RTX technology with a focus on delivering levels of video quality and operational efficiencies currently not achievable using traditional methods. We are very excited to bring this solution to market and look forward to helping our studio and broadcast partners unlock incremental value from their content libraries.” – Jason Brahms, CEO VideoGorillas.

As Video Gorillas continues to refine this technique for release, they aim to provide broadcasters and major film studios a superior way to remaster their content libraries while preserving original artistic intent.

“We are creating a new visual vocabulary for film and television material that’s based on AI techniques. We’re working to train neural networks to remove a variety of visual artifacts, as well as understand the era, genre, and medium of what we are remastering. Using these neural networks allows us to increase perceptual quality and preserve the original look and feel of the material” – Alex Zhukov, CTO VideoGorillas

The research team at VideoGorillas trains a unique recurrent neural network (RNN) for each project, accelerated by NVIDIA GPUs. The network learns the characteristics of titles created during the same era, in the same genre, using the same method of production. New content that is then passed through this network maintains the look and feel of that era/genre thus preserving artistic intent. 

A generative adversarial network (GAN) is used to remove unwanted noise and artifacts in low resolution areas while replacing them with new image synthesis and upscaling.  The outcome is a model that can identify when visual loss is occurring.

The networks are trained with Pytorch using CUDA and cuDNN with millions of images per film. However, loading thousands of images is creating a bottleneck in their pipeline. VideoGorillas is thus integrating DALI (NVIDIA Data Loading Library) to accelerate training times.

A cornerstone of video is the aggregation of visual information across adjacent frames. VideoGorillas uses Optical Flow to compute the relative motion of pixels between images. It provides frame consistency and minimizes any contextual or style loss within the image. 

This new level of visual fidelity augmented by AI is only possible with NVIDIA RTX, which delivers 200x performance gains vs CPUs for their mixed precision and distributed training workflows. Video Gorillas trains super resolution networks with RTX 2080, and NVIDIA Quadro for larger-scale projects. 

The extra power offered by NVIDIA Quadro enables VideoGorillas to apply super-res to HDR, high bit depth videos to up to 8K resolution, as well as achieve faster optical flow performance. The Tensor cores from the RTX GPUs provide a major boost in computing performance over CPUs, making them ideal for the mixed-precision training involved in VideoGorillas’ models.

“With CPUs, super resolution of videos to 4k and 8k is really not feasible – it’s just too slow to perform. NVIDIA GPUs are really the only option to achieve super resolution with higher image qualities” – Alex Zhukov, CTO VideoGorillas

And while on-prem solutions work perfectly for their training needs, they also expand their workloads to the cloud using NVIDIA Kubernetes, running both in local data centers as well as Amazon Web Services and Google Cloud Platform to orchestrate Super Res inference jobs accelerated by NVIDIA Tesla V100 GPUs. 

To learn more about VideoGorillas’ latest projects, visit their website

Apply AI to your visual applications by joining the early access of NGX and download the Optical Flow SDK.

Discuss (0)