NVIDIA Maxine Microservices - Early Access Program

This is an early access program available to a limited number of applicants based on use case/deployment infrastructure fit. Please kindly note, we require a mutual NDA to be executed before granting access to participate in the Maxine Early Access Program, and we require the application to be under your organization's email domain.


The Maxine Early Access Program is best suited for application developers from the following segments:

  • Providers of video conferencing, unified communications services or communications platforms
  • Providers of video streaming platforms or content delivery platforms
  • Application developers or content creators who would like to use cloud-native microservices to integrate with client side applications
  • Or generally, if you'd like to integrate Maxine features into your backend infrastructure or client side software applications

This early access program includes Maxine’s new cloud-native UCF-compliant Audio Effects Microservice, Video Effects Microservice, and Live Portrait Microservice.



Production Features

Audio Effects Microservices [UPDATED]

  • Updated Audio Super-resolution model with improved performance
  • ADA(L4/L40) support on Linux


Video Effects Microservices [UPDATED]

  • Added support for Gaze LookAway and enhanced Occlusion handling in Eye contact creating more natural gaze and more robust occlusion handling.
  • Added support for Background image replacement.
  • Added support for deployment time Pipeline tuning.
  • ADA(L4/L40) support on Linux


Live Portrait Microservices [UPDATED]

  • Added support for High-resolution model, which outputs fixed resolution of 1024x1024, with Mode 1 + Quality setting.
  • Added web-cam feed frame selection which allows selecting portrait image from a stream.
  • Added Mode3 support, Registration Blending
  • Added Support for Secured RTP(SRTP) input stream.
  • Added support for deployment time Pipeline tuning.
  • ADA(L4/L40) support on Linux


Early Access Features

Voice Font Microservices [NEW]
Voice Font Microservices provides real-time streaming zero shot voice conversion, in which a speaker’s timbre in input audio is converted to that of the reference audio, while retaining the linguistic content and prosody from the input.

  • Supports 2 reference audio modes:
  • Offline conversion with just 30 secs of reference speech (user reference mode)
  • Reference audio streamed in on the fly - reference is automatically improved as data collected improves
  • Supports two input modes (GRPC only)
  • Streaming input audio: Audio to be converted is streamed in in chunks of 800 ms and streamed back in real-time (processing takes < 800 ms)
  • PTT mode: Audio to be converted is provided all together (up to 45 sec), and is received back in a chunk
  • Supports 16kHz and 48kHz input audio frame rate over GRPC and RTP
  • Supports multiple concurrent users - 4 streams over RTP/GRPC in mode, 8/16 streams over GRPC depending on GPU.

Maxine SDK Early Access

If you’re looking for the Maxine SDK Early Access Program click here.




I agree to the terms of the NVIDIA Maxine Early Access License Agreement I also agree to receive communication from NVIDIA related to this program and related technologies. For further information see the NVIDIA Privacy Policy.