Why the best way to detect objects is not to detect them

A Comparison of Environmental Model Architectures for Automated Driving

Automated driving functions with higher automation levels require safe path planning that considers dynamic objects. If object dynamics are derived using approaches that rely on object detection, this involves severe risks due to error propagation in the processing chain. Integrated sensor fusion algorithms like the dynamic grid avoid these errors and provide the basis for safe path planning.

Consistent Path Planning Requires Reliable Object Dynamics

Any higher-level automated driving function needs to determine where the vehicle should go in the future. From a global perspective, this is a navigation task requiring reliable positioning and a map. However, for not colliding with other traffic participants or obstacles, a short-term motion plan is needed in addition to the global plan. Determining such a short-term plan is often called path planning.

As collision-free driving is of the highest priority, path planning is highly safety-relevant. From a sole safety perspective, the vehicle could drive to any location that is not occupied by any obstacle or that is free space. For static environments, we could determine the free space with a sensor like LiDAR. Then, it would be sufficient to plan a path so that the vehicle is always within the determined free space while the path is driven. This would be safe as all objects remain at their current positions in static environments while the vehicle is moving.


Once the environment contains dynamic objects, static path planning would not be safe anymore as the free space could change while the vehicle is driving the planned path. To overcome this limitation, we need to plan the path taking the motion of other traffic participants into account. If we know the object's dynamics, such as velocity and driving direction, we can predict where these objects will occupy space at some future point in time and adapt the planned path accordingly.

Approaches Based on Object Detection Introduce Risks to the Path Planning

In general, sensors provide no (camera, LiDAR) or limited (radar) information on object dynamics. Instead, they provide 2d/3d locations of reflections (LiDAR, radar) and pixel-level data (camera). Radar sensors are an exception as they provide the objects' radial (Doppler) velocity. However, they cannot determine the velocity of crossing traffic and often contain additional velocity ambiguities. The classical approach to determining the object dynamics is

  • extracting or detecting objects in the raw sensor data to get 2d/3d bounding boxes in world or image coordinates and then,
  • apply object tracking approaches to the detected bounding boxes and infer the dynamics over time.

Typically, detection failures (false negatives) and other detection errors have a much higher negative impact on the quality of the resulting environmental model than additional sensor errors, e.g., the accuracy of a radar's range measurement. Therefore, approaches based on object detection contain inherent risks for safety-related aspects like path planning.


Several architectures have been proposed for these approaches based on object detection.

This approach/architecture has potential drawbacks:

  • Sensor-wise object detection and tracking: Typical for this architecture, so-called smart sensors detect and track objects based on the sensor's individual information. Often, smart sensors are intended to be directly connected to an L2 driving function such as AEB or ACC and thus, optimize for low false alarm rates. Consequently, these systems rather miss an object than give a false alarm.
  • Object fusion on already tracked objects: Each smart sensor has different assumptions and models objects differently. Often, these assumptions are hardly known to the designer of the object fusion, which leads to incorrect models and limited object fusion performance. Even if the assumptions are known, they often cannot be integrated into the object fusion due to typical modeling limitations of the object fusion algorithms.
  • Propagation of detection errors: Path planning occurs after several processing steps where errors propagate and potentially accumulate. As object dynamics are derived from detected objects, any error in the object detection will potentially lead to errors in the object dynamics and thus, affects the safety of the path planning.


Improved architectures combine the sensor data at a low level and detect and track objects based on this fused information.

However, these approaches/architectures still rely on object detection and a separate determination of dynamic objects and free space and, thus, might be affected by the same error sources. In addition, this architecture has further disadvantages:

  • Centralized object detection: Although the object detection can rely on more information, the overall architecture still depends on the object detection performance, and object detection errors propagate to the path planning.
  • Learning required: Artificial intelligence (AI) as part of the object detection requires training specific to the actual sensor setup. Thus, fully pre-trained modules (e.g., coming from a supplier) cannot be used.

Dynamic Grid Fusion Allows for Early and Consistent Path Planning

While the classical approach requires detecting objects for deriving their dynamics and thus, potentially introduces errors, dynamic grid fusion determines quantities like velocity and driving direction at a lower level – the cell level. As a track-before-detect approach, it doesn't need to detect objects explicitly. Instead, dynamic quantities are determined for each cell individually.

Typical detection errors cannot negatively impact path planning by design when objects do not need to be detected. However, other functionalities may still require explicitly detected objects, e.g., to apply traffic rules such as giving way to other traffic participants. An architecture based on dynamic grid fusion allows for freely choosing the best data level for each function – cell-level for path planning, object-level for other functionalities.

An architecture based on dynamic grid fusion has the following advantages:

  • Safe path planning through consistent occupancy: The occupancy of relevant regions is consistently predicted at the cell level. As no object detection is involved, detection errors cannot propagate to the path planning.
  • Support for unknown dynamic objects: The dynamic grid determines the dynamics of objects even if their type is unknown at design time.
  • No artificial intelligence involved: Dynamic grid fusion does not contain AI and is purely based on statistical methods. For safety-critical applications, verification and validation require fewer efforts.
  • Optional object detection: Although path planning can be done on the cell level, explicit or additional object extraction is still possible and well supported as the available dynamic information makes object extraction and clustering more reliable.
  • Flexible sensor configuration: Different sensor modalities can be freely combined depending on the application, and not a specific modality like a camera is required.
  • No learning required: As no AI is involved in the dynamic grid, data-extensive learning is not required. Optionally, the dynamic grid architecture can include fully pre-trained modules from suppliers.
Top of page