NVIDIA DRIVE Perception



NVIDIA DRIVE™ Perception enables robust perception of obstacles, paths, and wait conditions (such as stop signs and traffic lights) right out of the box with an extensive set of pre-processing, post-processing, and fusion processing modules. Together with NVIDIA DRIVE™ Networks, these form an end-to-end perception pipeline for autonomous driving that uses data from multiple sensor types (e.g. camera, radar, LIDAR). DRIVE Perception makes it possible for developers to create new perception, sensor fusion, mapping and/or planning/control/actuation autonomous vehicle (AV) functionalities without having to first develop and validate the underlying perception building blocks.



DRIVE Perception is designed for a variety of objectives:

  • Developing perception algorithms for obstacles, path, and wait conditions
  • Detecting and classifying objects, drivable space, lanes and road markings, and traffic lights and signs
  • Tracking detected objects (such as other vehicles, pedestrians, road markings) from across frames
  • Estimating distances to detected objects
  • Fusing inputs from sensors of different modalities



Path Perception Ensemble

Path perception ensemble combines several base models and produces an optimal predictive model for drivable paths. Agreement and disagreement analysis between the base models enables generation of real-time confidence metrics.




Surround Camera Object Tracking

Surround camera object tracking software tracks objects in camera images, such as vehicles, pedestrians, and bicyclists, over time, and assigns a unique ID number to each tracked object.





Surround Camera-Radar Fusion

Surround camera-radar fusion is a sensor fusion layer built on top of surround camera and surround radar perception pipelines. It is designed to leverage the complementary strengths of each sensor type and provide quality semantic information as well as accurate position, velocity and acceleration estimates for objects around the car.



At the heart of NVIDIA DRIVE™ Perception are NVIDIA DRIVE™ Networks that deliver deep neural network (DNN) models that have been trained on thousands of hours of high-quality labeled data to produce outputs for use in obstacle, path, and wait conditions perception. DRIVE Networks include both convolutional and recurrent neural network models. The DNN modules also include optimized functionality to precondition the input, run inference on a GPU or Deep Learning Accelerator (DLA), and post-process the network output for consumption by the NVIDIA DRIVE™ Perception modules.


Details


Obstacle Perception

The NVIDIA DRIVE™ Perception pipeline for obstacle perception consists of interacting algorithmic modules built around NVIDIA DRIVE™ Networks DNNs, along with DNN post-processing. Capabilities include:

Camera-based:

  • Obstacle detection and classification, including cars and pedestrians, as well as distance to object detection (based on DriveNet DNN)
  • Drivable free-space detection (based on OpenRoadNet DNN)
  • Camera image clarity detection and classification (based on ClearSightNet DNN)
  • Semantic motion segmentation (SMS) for detection of both static and dynamic objects

Radar-based:

  • Surround obstacle detection and tracking over time


DriveNet

DriveNet is used for obstacle perception. It detects and classifies objects such as vehicles, pedestrians, and bicycles. DriveNet also includes temporal models for future object motion prediction.




OpenRoadNet

OpenRoadNet detects drivable free space around objects. It predicts the boundary that separates space occupied by obstacles from unoccupied driveable space.





Path Perception

The NVIDIA DRIVE™ Perception pipeline for path perception consists of interacting algorithmic modules built around NVIDIA DRIVE™ Networks DNNs, including DNN post-processing and the ability to consume HD Map input. Capabilities include:

  • Camera-based path perception (using PathNet DNN)
  • Lane, roadmarkings, and landmark detection (using MapNet DNN)
  • Path perception signal generation using HD Map input data
  • Machine learning algorithms that enable diversity and redundancy in path perception by combining multiple individual path perception signals (e.g. multiple DNN-based outputs, HD Map-based outputs, egomotion-based outputs) and generating a combined (ensemble) path perception output along with a confidence metric

PilotNet

PilotNet is trained on human driving behavior to predict driving trajectories for lane keeping, lane changes, as well as lane splits and merges.

PathNet

PathNet predicts all the drivable paths and lane dividers in images, regardless of the presence or absence of lane line markings.

MapNet

MapNet detects visual landmarks such as lane lines, crosswalks, text marks, and arrow marks on the road surface. It can detect features useful for path perception, as well as mapping and localization.



Wait Perception

The NVIDIA DRIVE™ Perception pipeline for wait conditions perception consists of interacting algorithmic modules built around NVIDIA DRIVE™ Networks DNNs, including DNN post-processing and the ability to consume HD Map input. Capabilities include:

  • Camera-based wait condition perceptions, such as perception of intersections, traffic lights, and traffic signs (using WaitNet DNN)
  • Camera-based traffic light state classification (using LightNet DNN)
  • Camera-based traffic sign type classification (using SignNet DNN)

WaitNet

WaitNet detects intersections, classifies intersection type, and estimates the distance to the intersection. WaitNet also detects traffic lights and traffic signs.


LightNet

LightNet classifies traffic light types (solid vs. arrows) as well as traffic light state (e.g. red vs. green vs. yellow).


SignNet

SignNet classifies traffic sign types (e.g. stop, yield, speed limit, etc.)




Advanced Functions Perception

The NVIDIA DRIVE™ Perception pipeline for advanced functions perception consists of interacting algorithmic modules built around NVIDIA DRIVE™ Networks DNNs, including DNN post-processing. Capabilities include:

  • Camera-based assessment of the cameras’ ability to see clearly (using ClearSightNet DNN)
  • Camera-based light source perception for automatic high beam control (using AutoHighBeamNet DNN)

ClearSightNet

ClearSightNet determines where the camera view is blocked and classifies the output in one three classes (clean, blurred, blocked).


AutoHighBeamNet

AutoHighBeamNet generates a binary on/off control signal for automatic high beam control.




Developing with DRIVE Perception

How to set up

You will need:

Steps:

  • Install NVIDIA DRIVE™ Perception using the SDK Manager.
  • Experiment with the NVIDIA DRIVE™ Perception Samples included in each DRIVE AV release. These samples are demonstrations of key capabilities and are intended to be used as a starting point for developing and optimizing code.

How to develop

With DRIVE Hyperion, experience and evaluate DRIVE Perception functionality with Roadrunner, the NVIDIA autonomous driving application.

Development Tasks Getting Started
Evaluate NVIDIA DRIVE Perception

Save and analyze the DRIVE AV Application (RoadRunner) data log; refer to the RoadCast section in the NVIDIA RoadRunner user guide (found in the NVIDIA DRIVE Documentation).

Use the deep neural networks included in DRIVE Perception to create obstacle, path, and wait perception algorithms within your software.

There are several samples included in the “samples” section of DriveWorks SDK Reference guide contained in the DRIVE Software Documentation.

For more detail and examples on how to use the DRIVE Perception APIs, refer to the following samples:

  • DriveNet Sample: Reading video streams sequentially, it detects the object locations in each frame and tracks the objects between video frames. The tracker uses feature motion to predict the object location
  • DriveNetNCameras Sample: Uses two GPUs to perform the same task as the DriveNet sample. In this case, the part where inference and tracking are performed is split between GPU 0 (inference) and GPU 1 (tracking)
  • Free-Space Detection Sample (OpenRoadNet): Demonstrates the detection from collision-free space in the road scenario
  • Lane Detection Sample (LaneNet): Performs lane marking detection on the road. It detects the lane you are in (ego-lane) and the left and right adjacent lanes if present. LaneNet has been trained with RCB images and aggressive data augmentation, which allows the network to perform correctly when using RGB encoded H.264 videos.
  • Light Classification Sample (LightNet): Demonstrates the detection of traffic lights facing the ego car
  • Sign Classification Sample (SignNet) : Demonstrates the detection of traffic signs facing the ego car
  • ClearSightNet Sample : Performs DNN inference on live camera feed or H.264 videos, evaluating each frame to detect blindness
  • Path Detection Sample (PathNet) : Demonstrates ego-path, as well as left and right adjacent paths detection
  • Landmark Detection Sample (MapNet) : Performs landmark detection on the road. Landmarks detected are: lane markings and poles; lane markings are tagged with the position relative to the car (ego-lane, left, right). MapNet has been trained with RCB images and aggressive data augmentation which allows the network to perform correctly when using RGB encoded H.264 videos

Additional Development Resources:

Documentation