Abstract

Estimating range to the closest object in front is the core component of the forward collision warning (FCW) system. Previous monocular range estimation methods mostly involve two sequential steps of object detection and range estimation. As a result, they are only effective for objects from specific categories relying on expensive object-level annotation for training, but not for unseen categories. In this paper, we present an end-to-end deep learning architecture to solve the above problems. Specifically, we represent the target range as a weighted sum of a set of potential distances. These potential distances are generated by inverse perspective projection based on intrinsic and extrinsic camera parameters, while a deep neural network predicts the corresponding weights of these distances. The whole architecture is optimized towards the range estimation task directly in an end-to-end manner with only the target range as supervision. As object category is not restricted in the training stage, the proposed method can generalize to objects with unseen categories. Furthermore, camera parameters are explicitly considered in the proposed method, making it able to generalize to images taken with different cameras and novel views. Additionally, the proposed method is not a pure black box, but provides partial interpretability by visualizing the produced weights to see which part of the image dominates the final result. We conduct experiments to verify the above properties of the proposed method on synthetic and real-world collected data.

Highlights

  • Range estimation, estimating distance from ego to the closest object in front, is the core component of the forward collision warning (FCW) system

  • We propose a novel end-to-end network to estimate the range to the closest object in front from a monocular image for forward collision warning (FCW)

  • The target range is represented as a weighted sum of a set of potential distances

Read more

Summary

Introduction

Range estimation, estimating distance from ego to the closest object in front, is the core component of the forward collision warning (FCW) system. Compared with the known vehicle size, planar road surface assumption is more general and presents more accurate results [12] All these traditional approaches need to detect the object first and estimate the distance to this object by utilizing perspective projection with the chosen cue. We use fully convolutional networks as the encoder and decoder parts of the U-Net structure and employ fully connected layers on the encoded features to provide spatially-specific operations and global receptive field We mask both the distance map and weight map to just remain pixels in the preset collision region to estimate range as a weighted sum of distances. Our method is not a pure black box but provides partial interpretability because the produced weight map can be visualized to indicate which part of the image dominates the estimated range

Overview
Distance Map Generation
Weight Map Generation
End-to-End Learning
Training Settings
Experiment Setup
Synthetic Dataset
Real-World Data Collection
Interpretability
Class-Agnostic Property
Generalization Capability
Closest Object
Comparison
Failure Cases
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call