The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and acting. In most vehicles on the road today, in-vehicle assistants still rely on fixed command-response patterns: interpret a phrase, trigger an action, reset.
While effective for well-defined tasks, this approach doesn’t scale to modern expectations, where drivers and passengers want conversational assistants that can handle ambiguity, manage multi-step tasks, and adapt to context that evolves throughout the journey.
Large language models (LLMs), vision-language models (VLMs), and speech models enable a fundamentally new interaction paradigm. Rather than relying on command matching, these models support conversational AI with memory and reasoning, multimodal interaction across voice, vision, and telemetry, and context-aware, proactive assistance that anticipates user needs instead of simply reacting to requests.

The range of experiences this unlocks is significant. Intelligent routines—such as calendar-aware greetings and smart home integration—become seamless. Drivers gain real-time, contextual explanations of their surroundings and ADAS behavior, building trust through transparency. Natural-language diagnostics enable predictive maintenance without requiring technical expertise. At the same time, personalized comfort modes tailored to children or elderly passengers become both practical and intuitive to implement.
At scale, the opportunity is substantial. According to ABI Research, global shipments of vehicles with agentic AI are expected to grow to 70 million by 2035 from around 5 million in 2025. Delivering these experiences within the vehicle—where strict latency, safety, and privacy requirements apply—is a true systems engineering challenge. Moreover, an in-vehicle AI assistant cannot operate in isolation; it must seamlessly integrate with cloud-based AI agents and external services to extend its capabilities.
This post walks through the architecture, tooling, and deployment path for building a production-grade agentic cabin assistant leveraging NVIDIA DRIVE.
The core challenge: Real-time AI at the edge
Replacing an intent classification pipeline with a reasoning loop requires substantially more on-device compute. A production agentic AI assistant running on-device needs to:
- Run 7B+ parameter models locally
- Process multimodal inputs (camera, audio, telemetry)
- Maintain low latency (<500 ms response time)
- Sustain >30 tokens/sec decode throughput
- Ensure data privacy (edge-first execution)
DRIVE AGX platforms are ideally suited to meeting these requirements and can be integrated by automakers in different ways as described below.
AI box: A dedicated platform for in-vehicle AI LLM acceleration
Built on DRIVE AGX, the automotive AI box provides a modular AI compute solution that augments the limited inference capabilities of traditional infotainment systems on a chip (SoCs), enabling scalable deployment of advanced LLM and VLM workloads.
As an add-on ECU, the AI box can integrate seamlessly with most existing IVI systems, requiring only a lightweight interface to exchange tokens and camera data with the cockpit computer. This interface is typically Ethernet, with optional DisplayPort or camera serial interface (CSI) for video inputs. Running LLMs and VLMs locally, the AI box processes inputs from the cockpit system and returns intelligent outputs that power advanced AI assistants and rich in-vehicle experiences. This architecture enables original equipment manufacturers (OEMs) to upgrade vehicles with basic in-vehicle infotainment (IVI) systems into modern, agentic AI platforms—without the need for extensive redesign of the IVI stack or changes to the core vehicle electronics architecture.

Compared to running advanced AI workloads directly on an infotainment SoC, the automotive AI box built on DRIVE AGX offers a purpose-built, decoupled compute platform designed for in-vehicle AI. It delivers higher performance, stronger workload isolation, and faster time to deployment for LLM and VLM applications. Key benefits include:
- Significantly higher AI compute capacity: Supports execution of larger LLMs (up to ~13B parameters) with higher and more consistent inference throughput than infotainment SoCs optimized for UI and media workloads.
- Dedicated memory bandwidth with guaranteed QoS: Provides isolated, guaranteed memory bandwidth for LLM inference, ensuring predictable performance that doesn’t degrade when concurrent infotainment, graphics, or multimedia tasks are running.
- Deterministic, high-throughput inference performance: Sustains the high token decode rates required for fluid, conversational experiences, independent of cockpit workload variability.
- Rapid time to market through production-ready platforms: Leverages automotive-grade hardware and a proven, production-ready software stack, enabling rapid deployment from day one.
- No changes to existing vehicle electronics architecture: Deploys as a modular add-on alongside existing cockpit systems, avoiding costly redesigns or requalification of IVI platforms.
- Independent AI upgrade cadence: Enables OEMs to evolve AI capabilities independently of the infotainment system, supporting frequent model and application updates without impacting UI stability, validation, or certification timelines.
The automotive AI box allows automakers to augment any vehicle with powerful Agentic AI capabilities. By adding the class leading LLM inference capabilities of DRIVE AGX, automakers can bring modern AI cockpit experiences to their cars without needing to redesign their existing IVI systems.
The automotive AI-Box is available in two configurations to scale across vehicle segments:
The DRIVE AGX Orin – based AI-box brings production-ready, high-performance AI to mainstream vehicles today, while the DRIVE AGX Thor-based AI-Box, powered by the NVIDIA next-generation Blackwell GPU architecture, unlocks the most advanced LLM-driven experiences for premium vehicles.
DRIVE AGX Thor: Multi-domain AI computer

DRIVE AGX Thor extends the capabilities of the DRIVE AGX platform with Blackwell GPU architecture, delivering unprecedented on-edge inference performance. It provides the compute headroom to host both Autonomous Vehicle and In-vehicle AI workloads on a unified multi-domain AI computer. Thor is also built with extensive hardware and software mechanisms to ensure isolation and guarantee FFI (freedom from interference) between mixed criticality workloads. In addition, DriveOS 7 on Thor supports multiple QNX and Linux virtual machines enabling secure software environments for both AV and in-vehicle AI domains.
DRIVE AGX Thor’s powerful AI performance and extensive isolation capabilities allow a new level of centralization in vehicle E/E architecture. Automakers can deploy all AI features in the vehicle on DRIVE AGX Thor, leveraging the same software environment and AI toolchain for both AV and In-vehicle AI domains.
Central car computer with DRIVE AGX and MediaTek Dimensity AX

DRIVE AGX can also be paired with MediaTek’s Dimensity AX C-X1 cockpit SoCs to deliver best-in-class in-vehicle and AV experiences in a central car computer. While the C-X1 includes an NVIDIA GPU capable of LLM inference, pairing with DRIVE AGX SoC offloads AI workloads—enabling either more concurrent models for richer, multimodal use cases, or allowing the C-X1 to focus on other cockpit workloads such as high-end in-car gaming and multimedia.
In addition, the MediaTek Dimensity platform shares the DriveOS environment with DRIVE AGX Orin and Thor, providing a unified software foundation that simplifies development across both AI and IVI domains, with high bandwidth data such as video and audio being shared efficiently and seamlessly across a PCIE link via DriveOS NvStreams API.

Together, DRIVE AGX and MediaTek Dimensity AX offer scalability, unprecedented LLM inference performance, and a unified software architecture for automakers for all AI features in the car, including AV as well as in-vehicle AI. MediaTek Dimensity AX can serve as the cockpit compute solution of choice across any of the architectures outlined above, seamlessly integrating with NVIDIA DRIVE AGX. Between AI Box, multi-domain AI computer and a centralized car computer, there are multiple options for OEMs to design their E/E architectures to build a futureproof, AI-native car of the future.
Hybrid architecture: AI inference from cloud to edge
While the DRIVE AGX and MediaTek solutions provide powerful compute options for AI assistants at the edge, many common tasks like web research, social media interaction and trip planning require integration with web APIs and agents. For these cases, inference in the cloud enables the use of large, powerful models to process extensive and complicated user requests.
A holistic architecture combining edge and cloud AI provides the best user experience:
- Agent orchestration: Depending on user intent and current context, route to the correct local or web agents to solve the task. Often a mix of agents will be required — for example, a discussion about an upcoming trip would involve a local navigation agent, a cloud-based agent searching for scenic spots and restaurants, and a local knowledge agent explaining the destination. The assistant can also be triggered by vehicle events such as a traffic jam ahead, by external events such as incoming mail, or by its own routines.
- Context sharing: When cloud agents get involved, it is crucial to share relevant context with them to enable a seamless experience. It would break the experience if a cloud assistant had to ask for context the user already shared. Equally, cloud agents may possess information that is critical for local agents, for example that an important meeting is upcoming and the driver should not be overloaded with information.
- UX transparency: Web research and remote tool calls may take time, and if the internet connection is interrupted, workloads dispatched to the cloud may not return. Users may not expect this if they are not properly informed. The vehicle AI assistant must track asynchronous workloads, their expected completion time, and the status of internet connectivity, and have fallback mechanisms in place.

Building a hybrid in-vehicle agentic AI pipeline
In-vehicle AI assistants understand occupant intent by interpreting both in-vehicle context and external signals, enabling them to deliver timely, relevant, and proactive responses. For more complex tasks, they seamlessly collaborate with cloud-based AI agents to perform research, access services, and extend capabilities beyond the vehicle.
Delivering this experience requires moving beyond a simple prompt-response model. The assistant must plan, invoke tools such as navigation, vehicle APIs, and knowledge systems —and iteratively execute toward the user’s goal. This is enabled by an agentic AI pipeline that integrates orchestration, tool use, and memory with robust policy enforcement and fallbacks, ensuring multi-step tasks are completed safely within in-vehicle latency and permission constraints.
Agentic AI pipelines rely on the following key components:
- Automatic speech recognition (ASR): Converts cabin microphone audio into text (often with noise suppression and wake-word or endpointing) so downstream reasoning runs on a reliable transcript.
Start with: NVIDIA Nemotron speech ASR models - Orchestrator and agent framework: Routes intents, maintains session state, selects skills or tools, and enforces policies (timeouts, fallbacks, and what the agent is allowed to change in the car).
Start with: NeMo Agent Toolkit for agent development - LLM inference engine: Framework that handles tokenization, batching, KV-cache management, and hardware-accelerated execution so models meet real-time or near–real-time latency targets on the target SoC or cloud path.
Start with: TensorRT-LLM for server-side LLM inference, TensorRT Edge-LLM for inference on edge and NVIDIA NeMo to build, customize and optimize models - AI models: Open-source or proprietary LLM or VLM weights supply language understanding, summarization, or even visual understanding of the cabin and the vehicle exterior.
Start with: NVIDIA Nemotron family of open models, TensorRT-LLM supported models and TensorRT Edge-LLM supported models - Text-to-speech (TTS): Turns the assistant’s final answer into natural spoken audio with a consistent voice, prosody suited to driving, and output formats the vehicle’s audio stack can play reliably.
Start with: NVIDIA Magpie TTS models
Figure 7, below, shows how these components connect across an agentic AI pipeline from edge to cloud.

From AI factory to in-vehicle deployment
Development of an agentic in-vehicle assistant requires a different workflow than a traditional voice command system. It begins in the AI factory, where models are trained, fine-tuned, evaluated, and integrated into agentic workflows at scale. This cloud-based environment enables rapid iteration, continuous improvement, and validation of AI assistants using enterprise data, simulation, and orchestration pipelines.

NVIDIA NeMo is an end-to-end platform for building, customizing, and deploying enterprise generative AI models. It offers tools for data curation, training, fine-tuning, evaluation, and guardrailed deployment across cloud and edge. Supporting LLMs, multimodal models, RAG, and agentic workflows, NeMo delivers scalable, production-ready AI applications. Integrated with NVIDIA accelerated computing and deployable via optimized microservices, it provides high performance, security, and portability from development to real-time inference in production, including in-vehicle systems.
Deploying on edge
Once validated, these models and pipelines are optimized for edge deployment and brought into the vehicle. Leveraging CUDA and TensorRT, a unified GPU programming model spans both cloud and embedded environments, ensuring consistency from development to deployment. Models are further optimized— through techniques such as quantization and pruning—and deployed on the AI box using TensorRT Edge-LLM for high-performance, low-latency inference.
This seamless path from AI factory to in-vehicle execution enables continuous innovation while meeting the strict latency, privacy, and reliability requirements of in-vehicle AI.

TensorRT Edge-LLM is the NVIDIA inference framework for autoregressive models including LLMs, VLMs, and VLAs on embedded platforms. It is designed specifically for the needs of an embedded context: low latency, low memory and compute footprint, and minimal dependencies. The framework supports the latest edge-friendly models including the NVIDIA Nemotron family of open models. TensorRT Edge-LLM is available on GitHub as open source.

Getting Started
To build in-vehicle AI applications on AI Box built on the NVIDIA DRIVE AGX platform:
- Prototype in the cloud using NeMo and NIMs
- Leverage TensorRT-LLM for optimized inference of cloud agents
- Build orchestrator and tool integration with NeMo Agent Toolkit
- Deploy via TensorRT Edge-LLM on a DRIVE AGX DevKit
- Connect with our partners to enable DRIVE AI Box production:
- Platform (hardware + software) providers: Bosch, Desay SV, Lenovo, PATEO, ThunderSoft, Visteon
- Software (model + pipeline) providers: Amazon Alexa, ArcherMind, Cerence AI, Volcano Engine
- Iterate with a hybrid edge-cloud feedback loop
The NVIDIA full-stack approach, from AI factory to DRIVE AGX on the edge, provides a production-ready pathway to bring intelligent, multimodal, agentic experiences into every vehicle.