Abstract

Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call