Project Mellon is a lightweight Python package capable of harnessing the heavyweight power of speech AI (NVIDIA Riva) and large language models (LLMs) (NVIDIA NeMo service) to simplify user interactions in immersive environments. NVIDIA announced at NVIDIA GTC 2023 that developers can start testing Project Mellon to explore creating hands-free extended reality (XR) experiences controlled by natural-language voice commands.
Words can move mountains, as J.R.R. Tolkien’s riddle guarding the Doors of Durin (“Speak friend, and enter”) reminds us. The fundamental idea behind Project Mellon is that the power of speech AI and LLMs can be harnessed in a practical way—to open doors, and do so much more—in the virtual world.
In XR, user interfaces can be complicated and difficult to use, disrupting the sense of natural immersion that is the essence of virtual, mixed, and augmented realities. Project Mellon enables developers of almost any application, whether in XR or flat-screen worlds, to easily add natural language understanding to their software as a new kind of human-centered, hands-free user interface.
The Project Mellon platform consists of the following:
- Project Mellon SDK
- NVIDIA Riva (ASR, TTS, NMT)
- NeMo service (other LLMs are also supported)
Key release features in Project Mellon 1.0 include:
- Lightweight, easily integrated Python library
- LLM support for accuracy of natural-language understanding
- Zero-shot language models that eliminate the need for command-specific training
- Natural language command support with conversational and visual context
- Support for asking questions about commands and scene, with natural language responses
- Simple Python API for command understanding and execution
- Web-based test application
- ASR, TTS, LLM, and neural machine translation (NMT) can be on-premises or hosted remotely, with low latency response times
Join ESI Group at GTC 2023 for Collaborating with AIs in Virtual Reality: Immersive Digital Assistants to learn about their research on how teams in a dynamic, collaborative XR environment can benefit from the use of conversational AIs.
“We’ve found that integrating a conversational AI with NVIDIA Project Mellon allows us to lower the entry threshold to collaborative XR technology and humanize the user experience within IC.IDO Weave,” said Jan Wurster, Solution and Technology Expert at ESI Group. “With the use of natural speech as an input, our virtual AI assistant can help teams work through review tasks, query available situations or find issues—all simply by asking in natural language and with no need to memorize specific commands.”
Developers can get started with Project Mellon today. Watch the Project Mellon demo to learn how to conduct design reviews, make real-time configuration changes, control a robot, and manipulate cameras and scene elements, all driven by natural voice commands.
AI is changing the way we interact with our work and tools. With speech AI and Project Mellon, developers can simplify and humanize the user experience. It is no longer necessary to train users how to operate every feature in virtual reality (VR). You can jump into a VR application and control the experience with your own words.