3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach Using Single Monocular Images

Ishan Gupta,Akshay Rangesh,Mohan Trivedi

doi:10.1007/978-3-030-11021-5_39

Abstract

Understanding 3D semantics of the surrounding objects is critically important and a challenging requirement from the safety perspective of autonomous driving. We present a localization prioritized approach for effectively localizing the position of the object in the 3D world and fit a complete 3D box around it. Our method requires a single image and performs both 2D and 3D detection in an end to end fashion. Estimating depth of an object from a monocular image is not as generalizable as pose and dimensions. Hence, we approach this problem by effectively localizing the projection of the center of bottom face of 3D bounding box (CBF) to the image. Later in our post processing stage, we use a look up table based approach to reproject the CBF in the 3D world. This stage is a single time setup and simple enough to be deployed in fixed map communities where we can store complete knowledge about the ground plane. The object’s dimension and pose are predicted in multitask fashion using a shared set of features. Experiments show that our method is able to produce smooth tracks for surround objects and outperforms existing image based approaches in 3D localization.

Full Text