NVIDIA Project Mellon

NVIDIA’s Project Mellon adds natural language commands to interactive applications. Project Mellon is a lightweight Python package harnessing the power of large language models (LLM) and speech AI to transform user experiences. NVIDIA Speech AI has the power to dramatically enhance the human-software interface.

Apply for Access

How does Project Mellon work?

Click to Enlarge

This is a typical Mellon configuration showing how the toolkit uses NVIDIA RIVA for automatic speech recognition (ASR), text-to-speech (TTS) and large language models (LMM) such as NVIDIA NeMo service for natural language understanding (NLU) to translate natural language to application-specific commands.

Project Mellon Key Features and Benefits

For Developers:

Zero Shot Language Models mean no need for training of language models
Python API for issuing commands and parameters to the application’s native fulfillment logic
Easy to use with multiple LLMs
Natural language allows a broader group of users to use developer’s application
Easy to extend speech control in English, Spanish, German, and Russian using NVIDIA RIVA

For Users:

Use natural language to command and control complex applications
Ability to use remotes services for automatic speech recognition (ASR), text-to-speech (TTS), and natural language understanding (NLU), local install is only a small Python package
Immersion in XR applications is not hindered by invasive GUIs

Enable Speech Commands in Extended Reality (XR) Applications

Mellon dispenses the need for menu systems and the need for memorizing hand controller functions. Humans speak in a natural language to control their experience.

Simplify the User Experience With a Familiar Interface

Natural language is the most human interface. Using the Mellon toolkit allows users the ability to navigate otherwise complex GUIs with nothing more than their voice.

Easily Harness the Power of Large Language Models

The Mellon Python package is lightweight and easy to implement. A simple API hands commands and parameters to applications' own fulfillment logic.

Uninterrupted Immersion

Replacing visually invasive user interfaces with voice commands means a deeper, uninterrupted immersion on XR applications.

Project Mellon Use Cases

Executive Design Reviews in Extended Reality

It’s no longer necessary to train a user to operate the controls inside and XR experience. The user can simply speak to the experience to drive it, allowing a broad range of users to interact, immersively with the digital asset.

Virtual Production

Speech AI fosters creative interactions by freeing artists from the constraints of typical experts-only button-and-menu-driven user interfaces. Change lighting conditions, camera parameters, environments, and the scene with simple voice commands.

Robot Interaction

Voice-enabled remote control of robots means no joystick, no specific controllers to learn, just talk to your robot to help it do its work.

Learn more about the release of Project Mellon in our announcement blog.

Read Now