Abstract Accurately perceiving three-dimensional (3D) environments or objects is crucial for the advancement of artificial intelligence (AI) interaction technologies. Currently, various types of sensors are employed to obtain point cloud data for 3D object detection or segmentation tasks. While this multi-sensor approach provides more precise 3D data than monocular or stereo cameras, it is also more expensive. The advent of RGB-D cameras, which provide both RGB images and depth information, addresses this issue.
In this study, we propose a point cloud segmentation method based on image masks. By using an RGB-D camera to capture color and depth images, we generate image masks through object recognition and segmentation. Given the mapping relationship between RGB image pixels and point clouds, these image masks can be further used to extract the point cloud data of the target objects. The experimental results revealed that the average accuracy of target segmentation was 84.78%, which was close to that of PointNet++. Compared with three traditional segmentation algorithms, the accuracy was improved by nearly 23.97%. The running time of our algorithm is reduced by 95.76% compared to the PointNet++ algorithm, which has the longest running time; and by 15.65% compared to the LCCP algorithm, which has the shortest running time among traditional methods. Compared with PointNet++, the segmentation accuracy was improved. This method addressed the issues of low robustness and excessive reliance on manual feature extraction in traditional point cloud segmentation methods, providing valuable support and reference for the accurate segmentation of 3D point clouds.