What is NVIDIA Maxine?
NVIDIA Maxine is a suite of GPU-accelerated AI SDKs and cloud-native microservices for deploying AI features that enhance audio, video, and augmented reality effects in real time. Maxine’s state-of-the-art models create high quality effects that can be achieved with standard microphone and camera equipment. Maxine can be deployed on premises, in the cloud, or at the edge.
Being successful while working remotely, on the road, or in a customer service center, all require increased presence — so video conferencing services and communications platforms must enable workers to be seen and heard clearly. Personal engagement increases when audio and video quality is improved on video conferencing and communications platforms, and shared eye contact during video calls helps improve interpersonal connection.
NVIDIA Maxine is part of NVIDIA AI Enterprise. NVIDIA AI Enterprise is an extensive library of full-stack software, including AI solution workflows, frameworks, pretrained models, and infrastructure optimization.
What Are The Benefits of NVIDIA Maxine?
State-of-the-Art NVIDIA AI Capabilities
NVIDIA Maxine offers world-class pretrained models for developers to deploy premium augmented reality, audio and video quality features.
Real-Time AI Performance
Maxine includes accelerated and optimized AI features for real-time inference on GPUs, resulting in low-latency audio, video, and AR effects with high network resilience.
Complete AI Pipeline
Maxine offers video decode, transcode, encode, conversational AI, computer vision, video streaming, and analytics to complete your AI pipeline.
Multi-Cloud, Customizable Deployment
Maxine’s cloud-native microservices allow for flexible, fast deployment and updates.
Access NVIDIA Maxine Microservices
Maxine’s cloud-native microservices allow developers to build real-time AI applications for high-quality audio and video communications. The microservices are ready-to-use containerized packages of cloud applications that are built from Maxine algorithms. These packages contain all end-to-end applications with necessary dependencies, which can be easily deployed on public and private clouds, and enable client applications to provide the benefits of NVIDIA Maxine algorithms via cloud-based GPU computing. Microservices can be independently managed and deployed within the application, accelerating development time.
Audio Effects Microservice offers the following GPU-accelerated AI-based audio effects:
- Speaker Focus
- Noise removal
- Room echo removal
- Audio Super-resolution
- Acoustic echo cancellation
Video Effects Microservice offers the following GPU-accelerated AI-based video effects:
- Virtual Background
- Eye Contact
Live Portrait Microservice contains the Live Portrait feature, which animates a person's portrait photo through their live video feed by matching the head movement and facial expressions to the photo.
Discover the NVIDIA Maxine SDKs
Audio Effects SDK
The Audio Effects SDK delivers multi-effect, low-latency audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.
High-performance, optimized AI models enable users to process thousands of audio streams per GPU in real time, enhancing audio quality by up to two mean-opinion-score points in subjective and objective quality metrics including Perceptual Evaluation of Speech Quality and Perceptual Objective Listening Quality Analysis. In desktop applications, the optimized models allow multiple applications, such as games, to run concurrently with minimal impact to the quality of both applications.
Developers can integrate the Audio Effects SDK into standalone Windows and Linux applications to process microphone and speaker audio, or into high-density servers for processing thousands of audio streams per server.
Key features include:
- [Updated] Audio Super Resolution: Improves audio quality by increasing the temporal resolution of audio signal. It currently supports upsampling from 8kHz to 16 kHz and from 16 kHz to 48 kHz. Now with enhanced quality. Updated with over 50% reduced latency.
- Acoustic Echo Cancellation: Cancels real-time acoustic device echo from the input audio stream, eliminating mismatched acoustic pairs and double-talk. With AI-based technology, more effective cancellation is achieved than with traditional digital signal processing.
- Noise Removal: Removes common background noise using state-of-the-art AI models, while preserving the speaker’s natural voice.
- Room Echo Removal: Removes reverberations from audio using state-of-the-art AI models, restoring the clarity of a speaker’s voice.
- [Updated] Speaker Focus: Separates the audio tracks of foreground and background speakers, making each voice more intelligible. Now in general availability.
Using these features, developers can also create innovative multi-effects by combining Noise Removal and Room Echo Cancellation, or Speaker Focus and Noise Removal while delivering optimized, real-time performance.
Video Effects SDK
Maxine’s Video Effects SDK enables AI-based visual effects that run with standard webcam input and can be easily integrated into video conference pipelines. The underlying deep learning models are optimized with NVIDIA AI using NVIDIA TensorRT for high-performance inference, making it possible for developers to apply multiple effects in real-time applications.
Key features include:
- [Updated] Virtual Background: Segments a person and applies AI-powered background removal, replacement, or blur. Now includes enhanced temporal stability. Updated with latency improvements.
- Super Resolution: Generates a detail-enhanced video using neural networks that reduce artifacts and preserves texture with up to 4X high-quality scaling.
- Upscaler: Delivers high-throughput and up to 4X high-quality scaled video with an adjustable sharpening parameter.
- Artifact Reduction: Reduces compression artifacts from encoded video while preserving original details.
- Video Noise Removal: Removes low-light camera noise introduced in the video capture process while preserving details.
Augmented Reality SDK
The Augmented Reality SDK offers AI-powered, real-time 3D face tracking and body pose estimation based on a standard webcam feed. Developers can create unique AR effects such as overlaying 3D content on a face — driving 3D characters and virtual interactions in real time.
Key features include:
- [Updated] Face Expression Estimation: Tracks the face and infers the subject’s expression. Estimated blendshape coefficients are used to animate a properly rigged model to accurately mirror the subject’s expression. Updated with enhanced AI model, new 6 degree-of-freedom (DOF) head pose, and new face model with updated blendshapes and face area partitioning.
- [Updated] Eye Contact: Simulates eye contact by estimating and aligning gaze with the camera. Updated with performance improvements via CUDA graph functionality.
- Face Tracking: Detects human faces in images and videos and specifies location and size of the bounding box.
- [Updated] Face Landmark Tracking: Recognizes facial features and contours using 126 key points. It also tracks head pose and facial deformation due to head movement and expression in three degrees of freedom in real time - now with Quality mode to achieve even higher-quality tracking.
- [Updated] Face Mesh: Represents a human face as a 3D mesh with up to 3,000 vertices and six degrees of freedom. Now includes a 3D morphable model from USC Institute of Creative Technologies.
- Body Pose Estimation: Predicts and tracks 34 key points of the human body in 2D and 3D. Commonly used in activity recognition, motion transfer, and virtual interactions in real time.
Maxine Builds on Powerful NVIDIA AI SDKs
Explore technologies that integrate with Maxine’s modular, customizable, and scalable pipeline. To enable better communication and understanding, Maxine integrates NVIDIA Riva’s real-time translation and text-to-speech capabilities with photo animation “Live Portrait” and Maxine’s Eye Contact features.
Partners


NVIDIA Maxine Resources

NVIDIA Omniverse Avatar Cloud Engine
Omniverse ACE is a collection of cloud-based AI models and services for developers to easily build, customize, and deploy interactive avatars.
Learn More

Avaya Delivers Enhanced Video Conferencing Experience From the Cloud with NVIDIA Maxine
Avaya’s new cloud media processing framework delivers high-quality real-time voice and video with minimal latency while supporting innovative AI algorithms provided by Maxine.

Reimagining Virtual Meetings With Speech AI and Deep Learning
With Maxine integrated into Pexip’s flexible, secure digital infrastructure, these advanced features are delivered at the server level, meaning all participants in the video meeting will have the same enhanced experience..
Want to help improve NVIDIA Broadcast App features? Check out our interactive crowdsource page.
Find more resources.
Discover NVIDIA AI technologies
Read about the latest developer software released at GTC 2022, including tools for conversational AI, inference, and more.
Watch the GTC 2022 keynote
Learn about the latest updates to NVIDIA Maxine from NVIDIA CEO Jensen Huang.
Read the latest Maxine news
Read how leading collaboration, content creation, and streaming providers are using NVIDIA Maxine.
NVIDIA Maxine on Github
NVIDIA Maxine source code is available on Github.