Abstract

The scene perception of a production workshop is the key to realizing its intelligence. Three-dimensional (3D) instance segmentation based on pure deep learning is an effective method for scene perception, but it is difficult to apply in a production workshop owing to the large number of 3D instance segmentation labels required, which are difficult to collect. This paper proposes a bi-stage multi-modal 3D instance segmentation method that realizes high-precision 3D instance segmentation in the absence of 3D instance segmentation labels. The method has two stages: acquisition of two-dimensional (2D) prior information and instance segmentation of the 3D point cloud. In the first stage, an RGB-D multi-modal fusion instance segmentation network is proposed to solve the problem that similar objects are difficult to distinguish in workshop scenes. In the second stage, accurate 3D instance segmentation is achieved by combining the acquired 2D prior information with correlation filtering algorithms. The performance of the proposed 2D and 3D instance segmentation methods is verified based on a self-built dataset: Scene Objects for Production workshop dataset (SOP). In 2D instance segmentation, compared to the sole reliance on RGB features, the mean average precision (mAP) of the instance segmentation network that incorporates depth features is improved by 3.1, to 72.1; in 3D instance segmentation, when the intersection over union (IoU) threshold is 0.35 and the mAP reaches 80.97. The results indicate that the proposed method realizes accurate perception of workshop objects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call