Simulating Realistic Traffic Behavior with a Bi-Level Imitation Learning AI Model

From last-minute cut-ins to impromptu U-turns, human drivers can be incredibly unpredictable. This unpredictability stems from the complex nature of human decision-making, which is influenced by multiple factors and varies across different operational design domains (ODD) and countries, making it difficult to emulate in simulation.

Yet, autonomous vehicle (AV) developers need to confidently develop and deploy systems that can operate in multiple ODDs with varying traffic behaviors. In the recently published paper, BITS: Bi-Level Imitation for Traffic Simulation, the NVIDIA Research team outlines a novel approach to simulating real-world traffic behavior that enables developers to do just that.

Bi-Level Imitation for Traffic Simulation (BITS) is a traffic model that captures the complexity of the real world with incredible fidelity while also outperforming previous methods. In a trial detailed in the paper, BITS improved coverage and diversity over the next best-performing model by 64% and 118%, respectively, and lowered failure rates by 36%.

Side-by-side views of the BITS model planning a traffic route - one showing prediction and the other showing the controller. — *Figure 1. By decoupling the traffic modeling process, BITS enables more realistic traffic simulation*

Traffic modeling challenges

Most simulators model traffic behavior by either replaying recorded data or using a predefined rule-based system to govern vehicle motion.

While replaying data enables accurate review and testing of specific scenarios encountered in real-world driving, it is difficult to simulate behaviors outside of those already recorded. On the other hand, rule-based controllers are limited to simple behaviors, preventing accurate simulation of more complex situations.

There are also learning-based approaches, which are trained on real-world driving logs to predict realistic future trajectories. While these models have proven effective in creating accurate and dynamic driving paths, they struggle to produce diverse trajectories that respect road boundaries and the presence of other agents.

BITS decouples the AI model into a high-level intent prediction and a low-level controller that achieves the overarching intent. By doing so, the model can synthesize a broad spectrum of traffic patterns that closely resemble real-world behavior, while also generating specific scenarios.

When BITS is run alongside other AI-powered traffic models, it consistently displays variety in traffic patterns while maintaining low failure rates (Figure 2).

Three bar charts comparing BITS model performance with three other learning-based models in coverage, diversity, and failure. BITS shows the highest levels of coverage and diversity and the lowest in failure rates. — *Figure 2. BITS shows the highest levels of coverage and diversity and the lowest in failure rates*

The BITS approach

BITS achieves such high levels of fidelity and diversity due to its hierarchical structure.

Both branches of the model are trained on real-world traffic logs. The high-level network is trained to identify possible goals for the vehicle, and the low-level network is trained to determine a policy that achieves the predicted goal. By splitting up these tasks, we can move the burden of modeling different trajectories to the high-level goal predictor, so the low-level goal-oriented policy can operate more efficiently.

BITS also includes a prediction-and-planning module to help stabilize the model in new environments and over longer time horizons. It achieves this by reviewing the model’s possible trajectories and selecting those that follow the rules of plausible driving behavior. This reduces the risk of diverging away from reasonable behaviors.

Evaluating BITS quality

Determining whether the behavior of a traffic model is realistic, as well as its ability to generate accurate and unseen scenarios, is incredibly difficult. This is because there is no ground truth for direct comparison. Thus, evaluating the BITS traffic model presents its own challenge.

As detailed in BITS: Bi-Level Imitation for Traffic Simulation, we divide our evaluation into three domains: rollout metrics (coverage, diversity, and failure rates), statistical differences compared to the real world, and resemblance to human drivers.

The first domain directly measures the low-level network in terms of its coverage area, the diversity of each run, and the frequency of collisions or off-road driving incidents. The second domain compares the speed and jerk differences of the simulated cars to real-world data. The third domain measures human-like behavior by comparing it to a prediction model that forecasts the agent’s future position at a given timestamp.

2D sketches of car trajectories, organized by four traffic models over five trials. The TPP and TrafficSim models show little variety in repeated trials, while the BITS model shows different trajectories across all five trials. — *Figure 3. Comparison of trajectories planned by various learning-based traffic models*

As shown in Figures 2 and 3, while other models exhibit tradeoffs between generating diverse trajectories and falling into repeated behaviors, BITS charts a new scenario each time with lower failure rates.

Conclusion

The ability to model realistic traffic behavior in simulation is critical to developing robust AV technology. By optimizing fidelity and diversity, BITS brings AI-generated traffic simulation even closer to the complexity of the real world. We aim to further develop and refine BITS, and ultimately integrate it into the production NVIDIA DRIVE Sim pipeline.

We invite the industry to use and contribute to this developing work in simulation, which is open-sourced at NVlabs/traffic-behavior-simulation on GitHub. We are also building and open-sourcing trajdata, a software tool that unifies data formats from different AV datasets and transforms scenes from existing datasets into interactive simulation environments.