In the industrial domain, estimating the pose of texture-less shiny parts is challenging but worthwhile. In this study, it is impractical to utilize texture information to obtain the pose because the features are likely to be affected by the surrounding objects. In addition, the colors of the metal parts are similar, making object segmentation challenging. This study proposes dividing the entire process into three steps: object detection, feature extraction, and pose estimation. We use the Mask-RCNN to detect objects and HRNet to extract the corresponding features. For metal parts of different shapes, different keypoints were chosen accordingly. Conventional contour-based methods are inapplicable to parts containing planar surfaces because the objects occlude each other in clustered environments. In this case, we used dense discrete points along the edges as semantic keypoints for metal parts containing planar elements. We chose skeleton points as semantic keypoints for parts containing cylindrical components. Subsequently, we combined the localization of semantic keypoints and the corresponding CAD model information to estimate the 6D pose of an individual object in sight. The implementation of deep learning approaches requires massive training datasets and intensive labeling. Thus, we propose a method to generate training datasets and automatically label them. Experiments show that the algorithm based on synthetic data performs well in a natural environment, despite not utilizing real scenario images for training.
Read full abstract