Experience Real-Time Audio and Video Communication with NVIDIA Maxine

The NVIDIA Maxine developer platform redefines video conferencing and editing by providing developers and businesses with a variety of low-code implementation options. These include GPU-accelerated AI microservices , SDKs, and NVIDIA-hosted API endpoints for AI enhancement of audio and video streams in real time.

The latest Maxine developer platform release introduces early access to Voice Font, improvements to video-driven Live Portrait, and improvements to the popular Maxine Eye Contact feature. In addition, the Video Live Portrait and Voice Font features are now available in the NVIDIA NGC catalog. You can now experience Maxine pretrained generative AI models in action on NVIDIA-accelerated cloud infrastructure.

NVIDIA AI Foundation Models such as Maxine demonstrate how enterprises can now connect their applications to read-to-integrate NVIDIA Foundations API endpoints and quickly create and deploy performance-optimized AI models with a reduced TCO.

NVIDIA Maxine Live Portrait webpage as seen on NVIDIA AI Foundation Models. — *Figure 1. You can now experience NVIDIA Maxine Live Portrait and Voice Font*

The Maxine team is also offering select partners the opportunity to give feedback on early versions of Maxine’s new Studio Voice and Speech-driven Live Portrait features. For more information, visit Maxine Microservices Early Access Program and Maxine SDK Early Access Program.

“With the Maxine developer platform, you can now experience state-of-the-art Maxine features on NVIDIA AI Foundations,” said Rochelle Pereira, Director of Engineering, Maxine Developer Platform. “You can design your deployment plan on a CSP of your choice or NVIDIA DGX Cloud, and choose your integration touch point ranging from microservice containers to SDK libraries or even ready-to-integrate NVIDIA AI Foundation Endpoints. Enhancing real-time audio and video communication in your application workflow just got a lot easier.”

New feature highlights

NVIDIA Maxine enables clear communications and increased presence for speakers in video conferences, live streams, and offline video. Maxine’s state-of-the-art AI models create high-quality effects that can be achieved with standard microphones and cameras.

Natural eye movement

The new production Maxine Eye Contact now has smoother transitions in gaze redirection and fine-grained controls for more natural eye movement. Eye Contact is available for developers to evaluate through the Maxine Early Access Program and for production use through NVIDIA AI Enterprise.

2D photo animation

The newest Maxine release also sees improvements to video-driven Live Portrait including increases in robustness and background stability. Maxine Live Portrait has been a game-changer, enabling 2D photo animation driven by video. This new Maxine release also introduces speech-driven Live Portrait that enables speech as a new driving modality.

You can now animate 2D photos with speech, providing a sense of presence even when conditions don’t permit real-time video streaming. Combined with NVIDIA Riva translation service and NVIDIA Maxine Voice Font, speech-driven Live Portrait ushers in new possibilities in the realm of 2D animation.

Speech-driven personas

NVIDIA Maxine video and speech Live Portrait animation AI microservices are ideal for anyone who does not want to appear on camera. Create a unique persona for an individual or a company using a stylized or photorealistic portrait photo. Speech-driven Live Portrait is available to select partners for feedback.

Video 1. Animate 2D portraits in real time using speech, no meshes or rigging required

Voice customization

With the new Maxine Voice Font feature, a generative AI model now available in the Maxine Early Access Program, you can customize your voice to a desired timbre. Generate a unique voice for a brand or replicate your voice for use with other translation microservices. This enables you to speak in different languages in your own voice, for example. The feature can convert audio samples into a digital voice with just 30 seconds of reference audio.

Check out the samples below to experience Voice Font.

Origin audio sample:

Reference audio sample:

Voice Font output, applying reference audio voice to origin audio:

Also available to select development partners for feedback is the latest NVIDIA Maxine AI enhancement capability. With Studio Voice, you can enhance a recording from an inexpensive microphone with the characteristics of a high-end studio microphone. Studio Voice removes speech frequency degradations caused by low-quality microphones. Additionally, characteristics like dynamic range and bandwidth extension are added using a pretrained neural net, giving the resulting audio a rich and vibrant sound.

Get a sneak preview of Studio Voice with the samples below:

Input voice on basic microphone:

Studio Voice output:

Summary

With NVIDIA Maxine, you can use AI to enhance your audio and video communication in real time. The latest Maxine release is available exclusively through NVIDIA AI Enterprise. This enables users to access enterprise support, production-ready tools including NVIDIA Triton Inference Server, and more. Try the newest NVIDIA Maxine features.

For early access to the latest NVIDIA Maxine features and to provide feedback, apply for the Maxine Microservices Early Access Program or Maxine SDK Early Access Program. To provide feedback on speech-driven Live Portrait or Studio Voice (not yet released), contact Greg Jones, Maxine Product Management, at gjones@nvidia.com.

You can also provide feedback through the NVIDIA Maxine and NVIDIA Broadcast App Survey to help improve Maxine features in upcoming releases.

Experience Real-Time Audio and Video Communication with NVIDIA Maxine