Abstract

Abstract Object detection as the main task of computer vision aims at locating and classifying interest objects in a scene. Most existing object detection methods utilize RGB images that are captured by cameras. However, RGB images cannot directly provide depth information that would help an object detector to achieve better performance in a complex environment. To address this problem, we present an early fusion architecture to perform object detection by combining RGB and depth images. The architecture firstly employs an unsupervised learning depth estimation technique to automatically infer a dense depth image from a single RGB input image. Then, the depth image is concatenated to a RGB image at a very low abstraction level to perform object detection using a deep learning model. Finally, the architecture predicts multiple 2D bounding boxes to localize the objects. To generate the depth image, we investigate the effect of the performance of four well-known depth estimation methods on our fusion architecture. Moreover, we compared the fusion architecture with two uni-modal architectures which use only RGB or depth images for object detection. The experimental results on the KITTI dataset show that our RGB-depth fusion approach outperforms the uni-modal architectures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call