Reinventing Real-Time Video Communication with AI

Get Started

What Is NVIDIA Maxine?

NVIDIA Maxine is a suite of GPU-accelerated SDKs that reinvent audio and video communications with AI, elevating standard microphones and cameras for clear online communications. Maxine provides state-of-the-art real-time AI audio, video, and augmented reality features that can be built into customizable, end-to-end deep learning pipelines.

Maxine’s AI SDKs—Video Effects, Audio Effects, and Augmented Reality — are highly optimized and scalable to deliver the highest performance possible on GPUs, on PCs, in data centers, and in the cloud. Maxine can also be used with NVIDIA Riva , an SDK for building conversational AI applications, to offer world-class language-based capabilities such as transcription and translation, and NVIDIA Video Codec, an SDK for accelerated encode, decode, and transcode.

Available on PC, data center, and cloud.

What are NVIDIA Maxine Benefits?

State-of-the-Art AI Capabilities

World class pre-trained models for high-quality audio, video, and augmented reality (AR) capabilities.

Real-Time AI Performance

Accelerated and optimized AI features for real-time inference on GPUs.

End-to-End Solution

Complete end-to-end pipelines for video decode, transcode, encode, conversational AI, computer vision, video streaming, and analytics.


Maxine SDKs

Audio Effects SDK

The Audio Effects SDK delivers AI-based audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.

High-performance, optimized AI models allow thousands of audio streams to be processed in real time per GPU, enhancing the audio quality by up to two mean-opinion-score points in subjective and objective quality metrics such as Perceptual Evaluation of Speech Quality and Perceptual Objective Listening Quality Analysis. In desktop applications, the optimized models allow multiple applications, such as games, to run concurrently with minimal impact to the quality of both applications.

Developers can integrate into standalone Windows and Linux applications to process microphone and speaker audio or into high-density servers for processing thousands of audio streams per server.

Key features include:

  • Audio super resolution: Improves real-time audio quality by upsampling the audio input stream from 8kHz to 16kHz and from 16kHz to 48kHz sampling rate
  • Acoustic echo cancellation: Cancels real-time acoustic device echo from input audio stream, eliminating mismatched acoustic pairs and double-talk. With AI-based technology, more effective cancellation is achieved than with traditional digital signal processing
  • Noise removal: Removes several common background noises using state-of-the-art AI models while preserving the speaker’s natural voice.
  • Room echo removal: Removes reverberations from audio using state-of-the-art AI models, restoring clarity of a speaker’s voice.

Using these features, developers can also create innovative multi-effects by combining Noise Removal and Room Echo Cancelation while delivering optimized performance and real-time latency.

Get started with the Audio Effects SDK  

Video Effects SDK

Maxine’s Video Effects SDK enables AI-based visual effects that run with standard webcam input and can easily be integrated into video conference and content creation pipelines. The underlying deep learning models are optimized using NVIDIA® TensorRT™ for high-performance inference, making it possible for developers to apply multiple effects in real-time applications.

Key features include:

  • Super resolution: Generates a detail-enhanced video using AI neural networks that reduces artifacts and preserves texture with up to 4X high-quality scaling.
  • Upscaler: Delivers high-throughput and up to 4X high-quality scaled video with an adjustable sharpening parameter.
  • Artifact reduction: Removes compression artifacts from encoded video while preserving original details.
  • Video noise removal: Removes low-light camera noise introduced in the video capture process while preserving details.
  • Virtual background: Segments a person and applies AI-powered background removal, replacement, or blur.

Get started with the Video Effects SDK  

Augmented Reality SDK

The Augmented Reality SDK offers AI-based, real-time 3D face tracking and body pose estimation based on a standard web camera feed. Developers can create unique AR effects such as overlaying 3D content on a face, driving 3D characters and virtual interactions in real time.

Key features include:

  • Face tracking: Detects human faces in images and videos and specifies location and size of the bounding box.
  • Face landmark tracking: Recognizes facial features and contours using 126 key points and tracks head pose and facial deformation due to head movement and expression in three degrees of freedom in real time.
  • Face mesh: Represents a human face with a 3D mesh with up to 3,000 vertices and six degrees of freedom.
  • Body pose estimation: Predicts and tracks 34 key points of the human body in 2D and 3D. Commonly used in activity recognition, motion transfer, and virtual interactions in real time.
  • Body pose estimation: Predicts and tracks 34 key points of the human body in 2D and 3D. Commonly used in activity recognition, motion transfer, and virtual interactions in real time.
  • Eye contact (apply for early access): Simulates eye contact by estimating and aligning gaze with the camera
  • Audio2Face (coming soon): Animates a 2D or 3D digital face with high fidelity based on just an audio input.

Get started with the Augmented Reality SDK  

Project Maxine Builds on Powerful NVIDIA SDKs

Explore technologies that integrate with Maxine’s modular, customizable, and scalable pipeline. For example, collaboration with global audiences can be dramatically improved when speaking in their language. To enable better communication and understanding, Project Maxine integrates NVIDIA Riva’s real-time translation and text-to-speech with Maxine's photo animation “live portrait” and eye contact in real time. Project Maxine is a reference application for Omniverse Avatar, a technology platform for generating interactive AI avatars.

Omniverse Avatar for developing AI avatars

Omniverse Avatar

Omniverse Avatar is a technology platform for developing interactive AI avatars. It connects NVIDIA’s core AI technologies— NVIDIA Riva for speech AI, NVIDIA Metropolis for computer vision, NVIDIA NeMo Megatron for natural language understanding, and NVIDIA Merlin™ recommendation engines.

Video Codec SDK is a comprehensive set of APIs

GPU-Accelerated Video Encode and Decode

The Video Codec SDK is a comprehensive set of APIs, including high-performance tools, samples, and documentation, for hardware-accelerated video encode and decode on Windows and Linux.

 Riva SDK - Application framework

Speech AI

NVIDIA Riva is a GPU-accelerated SDK for building speech AI applications that delivers real-time performance on GPUs.



Reinvent Video Applications

Learn how developers from Notch, Headroom, Be.Live, and Touchcast are using NVIDIA Maxine.

Watch Now

New AI Technologies

Read about the latest developer software tools released at GTC 2021.

Read News

GTC 2022 Keynote

Learn about the latest update for NVIDIA Maxine from NVIDIA’s CEO, Jensen Huang.

Watch Now

Latest Maxine News

Read how leading collaboration, content creation, and streaming providers are using NVIDIA Maxine.

Read News

NVIDIA Maxine is free to download for members of the NVIDIA Developer Program.

Download Now