Abstract

In this paper, we present a method for estimating a position, size, and orientation using a single monocular image. The proposed method makes use of an inverse perspective mapping to effectively estimate the distance from the image. The proposed method consists of two stages: 1) cancel the pitch and roll motion of the camera using inertial measurement unit and project the corrected front view image onto the bird's eye view using inverse perspective mapping. 2) detect the position, size, and orientation of the vehicle using a convolutional neural network. The camera motion cancellation process makes vanishing point to be located at the same point regardless of the ego vehicle attitude change. Through this process, the projected bird's eye view image can be parallel and linear to the x-y plane of the vehicle coordinate system. The convolutional neural network predicts not only the position and size but also the orientation of the vehicle for the 3D localization. The predicted oriented bounding box from the bird's eye view image is converted in the meter unit by the inverse projection matrix. The proposed method was evaluated on the KITTI raw dataset on the metric of the root mean square error, mean average percentage error, and average precision. Despite the conceptually simple architecture, the proposed method achieves promising performance compared to other image based approaches. The video demonstration is available online [1].

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call