Abstract Estimating the 6D poses of industrial parts is a fundamental task in automated industries. However, the scarcity of industrial part datasets and the effort involved to retrain networks present challenges when estimating poses for unseen parts. Although a few pre-trained networks demonstrate effectiveness on unseen objects, they often struggle to encode correct viewpoint for unseen industrial parts, which have significant geometrical differences compared to the pre-trained objects. Additionally, they overlook the viewpoint non-uniformity that frequently occurs in industrial settings, resulting in significant 3D rotation errors. To address these issues, a novel 6D pose estimator for unseen industrial parts is proposed. First, a Self-to-Inter (S2I) viewpoint encoder is introduced to efficiently generate discriminative descriptors that capture the viewpoint information of the observed image. The S2I viewpoint encoder utilizes an Inter-viewpoint attention module to facilitate prior viewpoint communication and leverages a saliency descriptor selection strategy to boost inference speed. Second, a Viewpoint Alignment module (VAM) is established and integrated with the ICP refiner. The VAM aligns non-uniform viewpoints in an analytical paradigm, leading to enhanced efficiency of the refinement process and more accurate final predictions. Experimental results on the LINEMOD dataset demonstrate competitive performance compared to state-of-the-art methods. Furthermore, the experiments conducted on eight unseen industrial parts validate the exceptional generalizability of our method, highlighting its potential in industrial applications.
Read full abstract