Accelerated SDK with state-of-the-art AI features for building virtual collaboration and content creation applications.
What Is NVIDIA Maxine?
NVIDIA Maxine is a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming.
Maxine’s AI SDKs—Video Effects, Audio Effects, and Augmented Reality (AR)—are highly optimized and include modular features that can be chained into end-to-end pipelines to deliver the highest performance possible on GPUs, both on PCs and in data centers. Maxine can also be used with NVIDIA Riva, an SDK for building conversational AI applications, to offer world-class language-based capabilities such as transcription and translation.
Developers can add Maxine AI effects into their existing applications or develop new pipelines from scratch using NVIDIA DeepStream, an SDK for building intelligent video analytics, and NVIDIA Video Codec, an SDK for accelerated encode, decode, and transcode.
State-of-the-Art AI Capabilities
World class pre-trained models for high-quality audio, video, and augmented reality (AR) capabilities.
Real-Time AI Performance
Accelerated and optimized AI features for real-time inference on GPUs.
Complete end-to-end pipelines for video decode, transcode, encode, conversational AI, computer vision, video streaming, and analytics.
Touchcast utilizes state-of-the-art rendering and AI technologies for running beautiful online events with stunning life-like virtual venues and real-time collaboration capabilities. As the leader in powering the next era of computing, NVIDIA Maxine is paving the future of video communications—a future where AI and neural networks enhance and enrich content in entirely new ways. By working with NVIDIA, Touchcast can continue to be at the forefront of building the world’s most incredible experiences for its clients.
Edo Segal, Founder and CEO
SoftBank Corp. is committed to providing the best communication experience possible and Maxine dramatically improves communication clarity and quality. With capabilities such as audio background noise removal and video super resolution, our users see and hear each other more clearly, making their communications more efficient and effective.
Ryuji Wakikawa, Vice President, Head of Advanced Technology Division
Pexip has always pushed the boundaries of video communications with its distributed, virtualized conferencing platform. We're exploring how NVIDIA Maxine capabilities like audio noise removal and virtual background can support premium video conferencing experiences for enterprises of all sizes. Together with NVIDIA, we look forward to providing the next generation of AI-powered video communications—creating virtual meetings that are better than meetings in person.
Giles Chamberlin, CTO and Co-founder
We believe real-time AI can take the work out of video conferencing so that people can meet without distractions. NVIDIA Maxine is the first platform that supports those real-time AI video conferencing features. Maxine allows our users to communicate more consistently and effectively, focusing on the content of the discussion instead of the distractions.
Julian Green, CEO
The exciting noise cancellation performance of the Maxine Audio SDK has proven to be easy to use and incredibly powerful. We envision using Maxine to allow our customers to have clear and intelligible conversations in situations never thought possible before.
John Chow, Product Manager
By processing our video streams with Maxine in the cloud, we are able to give our customers advanced abilities, without them having to invest in expensive equipment. According to our users, the quality of Maxine's video output, enhanced with AI features, is the best in the entire market. Working with the Maxine SDK allowed us to create state-of-the-art solutions for our customers, in record time.
Tzafrir Rehan, CTO
Maxine gives our users access to state-of-the-art, real-time, AI-driven body tracking and background removal. They can track and mask performers in a live performance setting, which in turn enables a whole world of creative use cases—and all just using a standard camera feed, eliminating the challenges of special hardware tracking solutions, which is a real game-changer. The integration of the Maxine SDK was very easy and took just a few days to complete.
Matt Swoboda, Founder, and Director
NVIDIA Maxine's AI-powered features let us enhance the production quality of our game streamers, starting with dynamic and intelligent noise removal for microphones to ensure clear speech during broadcasts. We also plan on integrating other features such as video denoising and upscaling as well as background removal without a green screen in the near future.
Miguel Molina, Technical Product Manager
Video Effects SDK
Maxine’s Video Effects SDK enables AI-based visual effects that run with standard webcam input and can easily be integrated into video conference and content creation pipelines. The underlying deep learning models are optimized using NVIDIA® TensorRT™ for high-performance inference, making it possible for developers to apply multiple effects in real-time applications.
Key features include:
- Super resolution: Generates a detail-enhanced video using AI neural networks that reduces artifacts and preserves texture with up to 4X high-quality scaling.
- Upscaler: Delivers high-throughput and up to 4X high-quality scaled video with an adjustable sharpening parameter.
- Artifact reduction: Removes compression artifacts from encoded video while preserving original details.
- Video noise removal: Removes low-light camera noise introduced in the video capture process while preserving details.
- Virtual background: Segments a person and applies AI-powered background removal, replacement, or blur.
Augmented Reality SDK
The Augmented Reality SDK offers AI-based, real-time 3D face tracking and body pose estimation based on a standard web camera feed. Developers can create unique AR effects such as overlaying 3D content on a face, driving 3D characters and virtual interactions in real time.
Key features include:
- Face tracking: Detects human faces in images and videos and specifies location and size of the bounding box.
- Face landmark tracking: Recognizes facial features and contours using 126 key points and tracks head pose and facial deformation due to head movement and expression in three degrees of freedom in real time.
- Face mesh: Represents a human face with a 3D mesh with up to 3,000 vertices and six degrees of freedom.
- Body pose estimation: Predicts and tracks 34 key points of the human body in 2D and 3D. Commonly used in activity recognition, motion transfer, and virtual interactions in real time.
- Eye contact (apply for early access): Simulates eye contact by estimating and aligning gaze with the camera.
- Audio2Face (coming soon): Animates a 2D or 3D digital face with high fidelity based on just an audio input.
Audio Effects SDK
The Audio Effects SDK delivers AI-based audio quality enhancement algorithms, improving end-to-end conversation quality for narrowband, wideband, and ultra-wideband audio.
High-performance, optimized AI models allow thousands of audio streams to be processed in real time per GPU, enhancing the audio quality by up to two mean-opinion-score (MOS) points in subjective and objective quality metrics such as Perceptual Evaluation of Speech Quality (PESQ) and Perceptual Objective Listening Quality Analysis (POLQA). In desktop applications, the optimized models allow multiple applications, such as games, to run concurrently with minimal impact to the quality of both applications.
Developers can integrate into standalone Windows and Linux applications to process microphone and speaker audio or into high-density servers for processing thousands of audio streams per server.
Key features include:
- Noise removal (NR): Removes several common background noises using state-of-the-art AI models while preserving the speaker’s natural voice.
- Room echo removal (REC): Removes reverberations from audio using state-of-the-art AI models, restoring clarity of a speaker’s voice.
- Audio super resolution (apply for early access): Improves real-time audio quality by upsampling the audio input stream from 8kHz to 16kHz and from 16kHz to 48kHz sampling rate.
- Acoustic echo cancellation (apply for early access): Cancels real-time acoustic device echo from input audio stream. With AI-based technology, more effective cancellation is achieved than with traditional digital signal processing.
Using these features, developers can also create innovative multi-effects by combining NR and REC while delivering optimized performance and real-time latency.
Project Maxine Builds on Powerful NVIDIA SDKs
Explore technologies that integrate with Maxine’s modular, customizable, and scalable pipeline. For example, collaboration with global audiences can be dramatically improved when speaking in their language. To enable better communication and understanding, Project Maxine integrates NVIDIA Riva’s real-time translation and text-to-speech with photo animation “live portrait” and eye contact in real time. Project Maxine is a reference application for Omniverse Avatar, a technology platform for generating interactive AI avatars.
Video Encode and Decode
The Video Codec SDK is a comprehensive set of APIs, including high-performance tools, samples, and documentation, for hardware-accelerated video encode and decode on Windows and Linux. AI Face Codec (coming soon) will enable smoother video and bandwidth reduction up to 10X.
Reinvent Video Applications
Learn how developers from Notch, Headroom, Be.Live, and Touchcast are using NVIDIA Maxine.
GTC 2021 Keynote
Learn about the latest update for NVIDIA Maxine from NVIDIA’s CEO, Jensen Huang.
NVIDIA Maxine is free to download for members of the NVIDIA Developer Program.