Content Creation / Rendering

Advancing Telepresence and Next-Generation Digital Human Technology with NVIDIA Maxine

Person looking at image of themself captured by a camera, and then the shot pans to a smaller display which shows their face in a 3D hologram.

At SIGGRAPH 2024 this week, NVIDIA is showcasing the latest advancements in the NVIDIA Maxine AI developer platform, available through NVIDIA AI Enterprise. This platform empowers you to deploy cutting-edge AI features that enhance audio and video quality and enable augmented reality effects.

NVIDIA just announced the upcoming availability of Maxine Video Relighting for early access developers, alongside the production launch of the Maxine Eye Contact NVIDIA NIM  microservice.

Maxine 3D, coming soon, alongside NVIDIA ACE, a suite of generative AI technologies for speech, intelligence, and animation, brings true-to-life digital human technology within reach of a wide range of developers and applications. It enables the use of real-time, photoreal 3D avatars with commodity video-conferencing audio and video devices.

Now there are added discoverability and trial options for the Eye Contact NIM microservice, as well as Audio2Face-2D (also known as Speech Live Portrait) through the comprehensive NVIDIA API Catalog.

Existing NVIDIA Maxine features continue to evolve. For example, Studio Voice and Background Noise Reduction 2.0 have undergone significant improvements in both performance and quality and are now production-ready.

Coming soon to NVIDIA Maxine

At the forefront of NVIDIA innovations is Maxine 3D (coming soon), a groundbreaking technology that seamlessly converts 2D video portrait inputs into immersive 3D avatars in real time. This advancement enables you to integrate 3D avatars in real time two-way communication using commodity video conferencing devices.

NVIDIA Maxine is driving the adoption of virtual and telepresence technologies in virtual event spaces, video conferences, video processing and editing software, and other immersive environments. It uses NVIDIA RTX rendering for lifelike, ultra-realistic visuals and promises to redefine user experiences by transforming standard 2D video inputs into dynamic 3D avatars.

“NVIDIA Maxine brings us closer to realizing the dream I’ve had since the founding of Looking Glass: virtual teleportation between physical spaces,” said Shawn Frayne, co-founder and CEO of Looking Glass.

“With Maxine, we now have the ability to transform any 2D video feed into immersive high-fidelity 3D holographic experiences, without complex camera setups. The simplicity of what this technology enables pairs perfectly with the ethos of Looking Glass—making 3D more accessible for everyone, without having to gear up in headsets.”

Looking Glass has been collaborating with NVIDIA Research to create an innovative video conferencing showcase using holographic 3D displays. This partnership uses NVIDIA technologies, including NVIDIA RTX 6000 Ada GPUs and Maxine 3D, to enable multiple viewers to experience authentic 3D content simultaneously without the need for headsets or eye tracking. The demonstration, featured at NVIDIA GTC 2024 and SIGGRAPH 2023, showcases the ability to synthesize 3D scenes from 2D images, enabling group viewing on Looking Glass’s 32″ landscape and 16″ portrait displays.

Maxine 3D employs AI, neural reconstruction, and real-time rendering to craft highly realistic digital avatars. By harnessing neural radiance fields (NeRF), it reconstructs detailed 3D perspectives from single 2D images.

Two people having a video conferencing session. The person on screen is being shown as a 3D hologram.
Figure 1. 3D video conference using Looking Glass Display powered by NVIDIA Maxine

Integrated with Audio2Face-2D technology for instant audio-to-2D facial animation, Maxine elevates these 2D avatars into immersive 3D representations. This breakthrough capability empowers you to fashion digital human technology that closely mimic real-world counterparts, thereby enriching experiences in virtual meetings, entertainment, and beyond.

Enhanced discoverability, accessibility, and portability

NVIDIA introduces Maxine features to its API Catalog, enabling you to discover and trial cutting-edge capabilities with ease, before you enter Early Access or NVIDIA AI Enterprise. This significantly lowers the barrier to entry for anyone exploring and integrating advanced AI-powered features into applications.

Maxine features available in the API preview catalog will also be available as NVIDIA NIM microservices. Microservices offer a highly optimized and versatile solution for AI deployment, providing prebuilt containers with industry-standard APIs that significantly reduce deployment times from weeks to minutes. These microservices support a wide range of NVIDIA hardware platforms and cloud providers, ensuring portability and ease of integration with popular AI frameworks.

As part of the NVIDIA AI Enterprise software platform, NVIDIA NIM microservices come with rigorous validation, security updates, and enterprise support, making them an ideal choice for businesses seeking enterprise-grade features.

Eye Contact NIM and Audio2Face-2D preview released

Two of Maxine’s most popular features, Eye Contact (now as a NIM microservice) and Audio2Face-2D are now available in the NVIDIA API Catalog.

Eye Contact enables users to appear to be making direct eye contact during video calls, enhancing engagement and presence in virtual meetings. The release of this microservice enables more portability and flexibility when implementing Eye Contact.

Speech Live Portrait / Audio2Face-2D, released in the Early Access program, animates static portraits based on audio input, creating dynamic, speaking avatars from a single image.

“It took us 2 hours to get Maxine integrated into our app, the API was seamless,” said Benjamin Portman, president and lead developer for Orpheus.

Video 1. Eye Contact NIM Microservice and Video Relighting from NVIDIA Maxine

Advanced video and audio enhancements

As video and audio technology continues to evolve, several new and enhanced features are being introduced to enhance the user experience:

  • Video Relighting
  • Studio Voice
  • Background Noise Reduction 2.0
  • Maxine hosted APIs

Video Relighting

The Maxine Video Relighting microservice (currently in Early Access) enables real-time lighting using a 3D HDR content map, enabling the seamless matching of foreground illumination with various backgrounds and environments.

Video Relighting uses AI to improve lighting conditions in real time, ensuring that subjects always look their best, and have matching, realistic lighting regardless of their physical environment or virtual background. This feature is particularly useful for maintaining an optimal appearance in various suboptimal lighting situations.

Studio Voice

The latest iteration of Studio Voice offers significant improvements in quality and performance, making it viable for real-time communications for the first time. This advancement brings studio-quality audio to everyday video conferencing setups using a low-latency model.

Background Noise Reduction 2.0

Background Noise Reduction 2.0 sets a new standard in audio clarity, effectively eliminating background noise while preserving the natural quality of speech. This feature is crucial for maintaining clear communication in diverse environments.

This model is also especially useful when combined with automatic speech recognition (ASR) technology to reduce errors in transcription.

Figure 2 shows the improvements in Character Error Rate (CER) using Background Noise Reduction 2.0.

Image of a graph showing 35% Character Error Rate (CER) Improvements, using Background Noise Reduction 2.0.
Figure 2. Character Error Rate (CER) improvements using Maxine’s Background Noise Reduction 2.0

Figure 3 shows the improvements in Word Error Rate (WER), using Background Noise Reduction 2.0.

Image of a graph showing 33% Word Error Rate (WER) improvements, using Background Noise Reduction 2.0.
Figure 3. Word Error Rate (WER) improvements using Maxine’s Background Noise Reduction 2.0

Maxine hosted APIs

Maxine features, starting with the industry-leading Eye Contact, are available as REST APIs on the NVIDIA Compute Framework (NVCF). They provide a flexible and low-code deployment option for Maxine algorithms. Studio Voice, mentioned earlier, is coming soon as an NVCF API.

Empowering developers and industries

NVIDIA Maxine is a comprehensive platform that empowers you to create next-generation applications for telepresence and digital character creation.

By providing these tools, NVIDIA is enabling industries ranging from entertainment and gaming to healthcare and education to use the power of AI-driven communication technologies. The platform’s ability to create immersive 3D experiences from 2D inputs is particularly significant for the burgeoning digital human technology market.

As virtual influencers, AI assistants, and digital avatars become more prevalent, Maxine’s technologies provide the foundation for creating believable and engaging digital personas.

Looking ahead

SIGGRAPH 2024 showed that NVIDIA Maxine is set to play a pivotal role in shaping the future of digital communication and telepresence. With its advanced AI capabilities and focus on developer accessibility, the Maxine developer platform isn’t just enabling existing communication paradigms. It’s providing you with the tools to create entirely new possibilities for how we interact in digital spaces.

The combination of Maxine 3D, advanced audio-visual enhancements, and easy-to-integrate APIs positions NVIDIA partners at the forefront of the digital human technology revolution. As the market for these technologies continues to grow, NVIDIA innovations are set to enable the next wave of immersive, lifelike digital experiences across industries.

Discuss (0)

Tags