Agentic AI / Generative AI

End-to-End Driving at Scale with Hydra-MDP

GIF of several autonomous driving views.

Jun 17, 2024

By Viola Wu, Zhiding Yu, Zhenxin Li, Shiyi Lan and Jose M. Alvarez

Discuss (0)

AI-Generated Summary

Dislike

Hydra-MDP is an innovative framework for end-to-end autonomous driving that uses a multi-teacher, student-teacher knowledge distillation architecture to integrate knowledge from human and rule-based planners.
The model won first place and the innovation award in the E2E Driving at Scale Challenge at CVPR 2024 by outperforming state-of-the-art planners on the nuPlan benchmark, demonstrating its robustness and adaptability.
Hydra-MDP's architecture combines multimodal and multi-target planning, explicit modeling of safety, and effective model ensembling to create a robust and versatile autonomous driving model that adapts to complex driving environments.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Building an autonomous system to navigate the complex physical world is extremely challenging. The system must perceive its environment and make quick, sensible decisions. Passenger experience is also important and includes acceleration, curvature, smoothness, road adherence, and time-to-collision.

In this post, we introduce Hydra-MDP, an innovative framework that advances the field of end-to-end autonomous driving. Hydra-MDP uses a novel multi-teacher, student-teacher knowledge distillation architecture, integrating knowledge from both human and rule-based planners. This enables the model to learn diverse trajectories, improving generalization across diverse driving environments and conditions.

Hydra-MDP provides a universal framework showing how machine learning-based planning can be enhanced by rule-based planners. This integration ensures the model not only mimics human driving behaviors but also adheres to traffic rules and safety standards, addressing traditional imitation learning limitations.

Hydra-MDP’s data-driven scaling laws demonstrate its robustness and adaptability. By using pretrained foundation models with extensive data and GPU hours, Hydra-MDP showcases its scalability and potential for continuous improvement.

The NVIDIA model Hydra-MDP won first place and the innovation award in the E2E Driving at Scale Challenge at CVPR 2024, outperforming state-of-the-art planners on the nuPlan benchmark. It offers a promising roadmap for the application of ML-based planning systems in autonomous driving.

Video 1. End-to-end autonomous driving refers to a holistic approach where a system takes in raw sensor data from cameras, radar, and lidar, and directly outputs vehicle controls.

Enhancing multimodal planning through multi-target hydra-distillation

Developing Hydra-MDP taught us several critical lessons that shaped its architecture and success. Hydra-MDP combines human and rule-based knowledge distillation to create a robust and versatile autonomous driving model.

Here are the key lessons we learned:

Embrace the complexity of multimodal and multi-target planning
Embrace the power of multi-target hydra-distillation
Explicit modeling of safety
Overcome the limitations of post-processing
Understand the importance of environmental context
Refine iteratively through simulation
Use effective model ensembling

Embrace the complexity of multimodal and multi-target planning

A foundational lesson was the necessity of embracing both multimodal and multi-target planning.

Traditional end-to-end autonomous driving systems often focus on single-modal and single-target objectives, limiting their real-world effectiveness. Hydra-MDP integrates diverse trajectories tailored to multiple metrics, including safety, efficiency, and comfort. This ensures that the model adapts to complex driving environments, not just mimicking human drivers.

Embrace the power of multi-target hydra-distillation

Multi-target Hydra-distillation, a teacher-student multimodal framework, was a pivotal strategy in our approach. By employing multiple specialized teachers—both human and rule-based—the model learns to predict trajectories that align with various simulation-based metrics. This technique enhances the model’s generalization across diverse driving conditions.

We learned that incorporating rule-based planners provided a structured framework, while human teachers introduced adaptability and nuanced decision-making capabilities, essential for navigating unpredictable scenarios.

Explicit modeling of safety

HydraMDP introduces a novel approach to modeling safety-related driving scores, setting it apart from previous end-to-end driving systems that predominantly rely on imitation learning from human-demonstrated trajectories. These traditional methods often lack sensitivity to safety concerns, primarily focusing on mimicking human behavior without directly addressing the safety implications of various driving decisions. In contrast, HydraMDP directly learns the consequences of different planning decisions by leveraging future perception ground truth data. This direct learning process allows HydraMDP to develop a robust sense of safety, even when trained on a relatively small dataset of approximately 100 hours. Additionally, this approach significantly reduces the overall cost of collecting extensive driving data, maximizing the utility of limited data. Furthermore, HydraMDP’s ability to achieve strong safety awareness with minimal data provides an effective and efficient solution for cold-starting end-to-end driving systems, facilitating quicker and safer deployment of autonomous driving.

Overcome the limitations of post-processing

Another insight was the inherent limitations of relying on post-processing for trajectory selection.

Traditional methods often lose valuable information by separating perception and planning into distinct, non-differentiable steps. Hydra-MDP’s end-to-end architecture integrates perception and planning in a seamless pipeline and maintains the richness of environmental data throughout the decision-making process. This integration enables more informed and accurate predictions.

Understand the importance of environmental context

Incorporating detailed environmental context is crucial for accurate planning.

Hydra-MDP’s perception network builds on the Transfuser baseline, combining features from LiDAR and camera inputs. This multimodal fusion helps the model better understand and react to complex driving environments.

Transformer layers connect these modalities, ensuring thorough encoding of environmental context and providing rich, actionable insights.

Refine iteratively through simulation

The iterative refinement process, facilitated by offline simulations, proved invaluable.

Running simulations on the entire training dataset generated ground truth simulation scores for various metrics. This data was then used to supervise the training process, enabling the model to learn from a wide range of simulated driving scenarios.

This step highlighted the importance of extensive simulation in bridging the gap between theoretical performance and real-world applicability.

Method	Image Resolution	Backbone	Pretraining	NC	DAC	EP	TTC	C	Score
Hydra-MDP-A	256 × 1024	ViT-L	Depth anything	98.4	97.7	85.0	94.5	100	89.9
Hydra-MDP-B	512 × 2048	V2-99	DD3D	98.4	97.8	86.5	93.9	100	90.3
Hydra-MDP-C	256 × 1024256 × 1024512 × 2048	ViT-LViT-L V2-99	Depth anything Objects365 + COCODD3D	98.7	98.2	86.5	95.0	100	91.0

Table 1. Accuracy of Hydra-MDP as a function of the resolution of the input image resolution, pretraining and backbone architecture. The winning solution, Hydra-MDP-C, combines them to boost performance.

Use effective model ensembling

Effective model ensembling was critical to our success.

We used techniques like Mixture of Encoders and Sub-score Ensembling to combine model strengths. This improved Hydra-MDP’s robustness and ensured that the final model could handle a diverse array of driving scenarios with high accuracy.

Ensembling techniques balance computational efficiency and performance, crucial for real-time applications.

Conclusion

Developing Hydra-MDP was a journey of innovation, experimentation, and continuous learning. By embracing multimodal and multi-target planning, leveraging multi-target hydra-distillation, and refining through extensive simulations, we created a model that significantly outperforms existing state-of-the-art methods. These lessons contributed to Hydra-MDP’s success and provided valuable insights for future advancements in autonomous driving.

For more information, see Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation. For related works, see AV Applied Research.

Discuss (0)

About the Authors

About Viola Wu
Viola Wu (吴紫华） is a senior product marketing manager at NVIDIA, handling the positioning, messaging, and branding of the end-to-end platform for autonomous vehicle development. She produces and hosts the DRIVE Labs video series. Viola holds engineering degrees from Rensselaer Polytechnic Institute and University College London, with a background in research, engineering, consulting, and product development. She was named one of Business Insider’s 2022 Top 31 Self-Driving Industry Power Players.

View all posts by Viola Wu

About Zhiding Yu
Zhiding Yu is a principal research scientist and research lead at NVIDIA Research. Before joining NVIDIA, he received his Ph.D. in ECE from Carnegie Mellon University in 2017. His research interests include Transformers, foundation models, and multimodal LLMs, with their applications to building the next-generation general intelligence. He is a recipient of multiple best paper awards and challenge winners. At NVIDIA, he led numerous efforts to develop state-of-the-art models, data engines and autonomous driving systems.

View all posts by Zhiding Yu

About Zhenxin Li
Zhenxin Li is pursuing an M.Sc. at Fudan University. His research interests include 3D perception and autonomous driving. He holds a B.Sc. from Fudan University. Currently, he is currently an intern on the NVIDIA Autonomous Driving Applied Research team.

View all posts by Zhenxin Li

About Shiyi Lan
Shiyi Lan is a senior research scientist on the NVIDIA Autonomous Vehicle Applied Research team. Before joining NVIDIA, he received his Ph.D. in computer science from the University of Maryland, College Park in 2022. His research interests include end-to-end autonomous driving, 3D perception, embodied AI, and vision-language models.

View all posts by Shiyi Lan

About Jose M. Alvarez
Jose M. Alvarez is a research director at NVIDIA, leading the Autonomous Vehicle Applied Research team. The team maximizes the impact of the latest research advances on the AV product. Research areas include model-centric and data-centric deep learning toward more efficient and scalable systems. Jose completed his Ph.D. in computer science in Barcelona, specializing in road-scene understanding for autonomous driving when datasets were very limited. He also worked as a postdoctoral researcher at NYU under Yann LeCunn.

View all posts by Jose M. Alvarez