A novel three-dimensional object detection with the modified You Only Look Once method

Xia Zhao,Haihang Jia,Yingting Ni

doi:10.1177/1729881418765507

Xia Zhao, Haihang Jia + Show 1 more

Open Access

https://doi.org/10.1177/1729881418765507

Copy DOI

Abstract

Three-dimensional object detection aims to produce a three-dimensional bounding box of an object at its full extent. Nowadays, three-dimensional object detection is mainly based on red green blue-depth (RGB-D) images. However, it remains an open problem because of the difficulty in labeling for three-dimensional training data. In this article, we present a novel three-dimensional object detection method based on two-dimensional object detection, which only takes a set of RGB images as input. First, aiming at the requirement of three-dimensional object detection and the low location accuracy of You Only Look Once, a modified two-dimensional object detection method based on You Only Look Once is proposed. Then, using a set of images from different visual angles, three-dimensional geometric data are reconstructed. In addition, making use of the modified You Only Look Once method, the two-dimensional object bounding boxes of the forward and side views are obtained. Finally, according to the transformation bet...

Highlights

Three-dimensional (3D) object detection predicts the category of an object along with 3D bounding box in scenes such as point clouds
Because of the success of 2D object detection and the mapping relationship between 2D images and 3D space, we present a 3D object detection method based on 2D object detection
According to the transformation between the 2D pixel coordinate and the 3D coordinate, the 2D object bounding box is mapped onto the reconstructed 3D scene to form the 3D object box

Summary

Introduction

Three-dimensional (3D) object detection predicts the category of an object along with 3D bounding box in scenes such as point clouds. Compared with the traditional detection methods, R-CNN achieves excellent object detection accuracy, say about 63% It has a notable drawback of slow detection speed, requiring 47 s for each image. You Only Look Once (YOLO)[11] models object detection as a regression problem It divides the input image into 7 Â 7 grids and predicts two bounding boxes for each grid cell. According to the transformation between the 2D pixel coordinate and the 3D coordinate, the 2D object bounding box is mapped onto the reconstructed 3D scene to form the 3D object box This method only needs the collection of 2D images to train the M-YOLO model, and it has a wide range of applications. Compared with YOLO’s two predicted bounding boxes, the anchor boxes of Faster R-CNN take into account the objects with different scales and aspect ratios.

Method

Experimental results

Conclusion

Girshick R