Abstract

Three-dimensional object detection aims to produce a three-dimensional bounding box of an object at its full extent. Nowadays, three-dimensional object detection is mainly based on red green blue-depth (RGB-D) images. However, it remains an open problem because of the difficulty in labeling for three-dimensional training data. In this article, we present a novel three-dimensional object detection method based on two-dimensional object detection, which only takes a set of RGB images as input. First, aiming at the requirement of three-dimensional object detection and the low location accuracy of You Only Look Once, a modified two-dimensional object detection method based on You Only Look Once is proposed. Then, using a set of images from different visual angles, three-dimensional geometric data are reconstructed. In addition, making use of the modified You Only Look Once method, the two-dimensional object bounding boxes of the forward and side views are obtained. Finally, according to the transformation bet...

Highlights

  • Three-dimensional (3D) object detection predicts the category of an object along with 3D bounding box in scenes such as point clouds

  • Because of the success of 2D object detection and the mapping relationship between 2D images and 3D space, we present a 3D object detection method based on 2D object detection

  • According to the transformation between the 2D pixel coordinate and the 3D coordinate, the 2D object bounding box is mapped onto the reconstructed 3D scene to form the 3D object box

Read more

Summary

Introduction

Three-dimensional (3D) object detection predicts the category of an object along with 3D bounding box in scenes such as point clouds. Compared with the traditional detection methods, R-CNN achieves excellent object detection accuracy, say about 63% It has a notable drawback of slow detection speed, requiring 47 s for each image. You Only Look Once (YOLO)[11] models object detection as a regression problem It divides the input image into 7 Â 7 grids and predicts two bounding boxes for each grid cell. According to the transformation between the 2D pixel coordinate and the 3D coordinate, the 2D object bounding box is mapped onto the reconstructed 3D scene to form the 3D object box This method only needs the collection of 2D images to train the M-YOLO model, and it has a wide range of applications. Compared with YOLO’s two predicted bounding boxes, the anchor boxes of Faster R-CNN take into account the objects with different scales and aspect ratios.

Method
Experimental results
Conclusion
Girshick R
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call