Multi-modal feature fusion for 3D object detection in the production workshop

Guangzhu Chen,Qingjun Ru,Zaizuo Tang,Yinhe Han,Rui Hou

doi:10.1016/j.asoc.2021.108245

Abstract

3D object detection technology is of great significance to realize intelligent perception and ensure the production safety of a workshop. Existing 3D object detection relies on large-scale, high-quality 3D annotation data and is unsuitable for actual workshop scenes’ perception. This paper proposes a multi-modal feature fusion 3D object detection method (MFF3D) for a production workshop. The design of MFF3D includes the following steps: (1) Improved YOLOv3 attains the 2D prior region of an object, and RGB-D saliency detection obtains the object image pixels in that region. (2) Depth image pixels corresponding to the object are projected to generate the object’s frustum point cloud, and a multi-modal feature fusion strategy simplifies the object’s frustum point cloud, so as to remove outlier points and reduce the number of point clouds. This can replace the 3D object reasoning process based on deep neural networks; (3) An axis-aligned bounding box algorithm is used to generate the object’s 3D bounding box, and principal component analysis algorithm (PCA) is used to calculate the object’s pose information. MFF3D is applied in the workshop, and experiments verify the feasibility and detection accuracy. We set up a production workshop object dataset (PWOD) for experimental evaluation. In the case of a small amount of 2D annotation data and no 3D annotation data, experimental results show that when the threshold value of intersection over union of 3D object (IoU3D) is 0.5, the mean average precision value of 3D object (mAP3D) reaches 60.31, and the detection speed reaches 3 FPS. MFF3D does not rely on 3D annotation data and can effectively detect objects of a production workshop.

Full Text