NVIDIA Project Mellon
NVIDIA’s Project Mellon adds natural language commands to interactive applications. Project Mellon is a lightweight Python package harnessing the power of large language models (LLM) and speech AI to transform user experiences. NVIDIA Speech AI has the power to dramatically enhance the human-software interface.
How does Project Mellon work?Click to Enlarge
Project Mellon Key Features and Benefits
- Zero Shot Language Models mean no need for training of language models
- Python API for issuing commands and parameters to the application’s native fulfillment logic
- Easy to use with multiple LLMs
- Natural language allows a broader group of users to use developer’s application
- Easy to extend speech control in English, Spanish, German, and Russian using NVIDIA RIVA
- Use natural language to command and control complex applications
- Ability to use remotes services for automatic speech recognition (ASR), text-to-speech (TTS), and natural language understanding (NLU), local install is only a small Python package
- Immersion in XR applications is not hindered by invasive GUIs
Enable Speech Commands in Extended Reality (XR) Applications
Mellon dispenses the need for menu systems and the need for memorizing hand controller functions. Humans speak in a natural language to control their experience.
Simplify the User Experience With a Familiar Interface
Natural language is the most human interface. Using the Mellon toolkit allows users the ability to navigate otherwise complex GUIs with nothing more than their voice.
Easily Harness the Power of Large Language Models
The Mellon Python package is lightweight and easy to implement. A simple API hands commands and parameters to applications' own fulfillment logic.
Replacing visually invasive user interfaces with voice commands means a deeper, uninterrupted immersion on XR applications.
Project Mellon Use Cases
Executive Design Reviews in Extended Reality
It’s no longer necessary to train a user to operate the controls inside and XR experience. The user can simply speak to the experience to drive it, allowing a broad range of users to interact, immersively with the digital asset.
Speech AI fosters creative interactions by freeing artists from the constraints of typical experts-only button-and-menu-driven user interfaces. Change lighting conditions, camera parameters, environments, and the scene with simple voice commands.
Voice-enabled remote control of robots means no joystick, no specific controllers to learn, just talk to your robot to help it do its work.
Learn more about the release of Project Mellon in our announcement blog.