Abstract

Six-degree-of-freedom (6DoF) object pose estimation is a crucial task for virtual reality and accurate robotic manipulation. Category-level 6DoF pose estimation has recently become popular as it improves generalization to a complete category of objects. However, current methods focus on data-driven differential learning, which makes them highly dependent on the quality of the real-world labeled data and limits their ability to generalize to unseen objects. To address this problem, we propose multi-hypothesis (MH) consistency learning (MH6D) for category-level 6-D object pose estimation without using real-world training data. MH6D uses a parallel consistency learning structure, alleviating the uncertainty problem of single-shot feature extraction and promoting self-adaptation of domain to reduce the synthetic-to-real domain gap. Specifically, three randomly sampled pose transformations are first performed in parallel on the input point cloud. An attention-guided category-level 6-D pose estimation network with channel attention (CA) and global feature cross-attention (GFCA) modules is then proposed to estimate the three hypothesized 6-D object poses by extracting and fusing the global and local features effectively. Finally, we propose a novel loss function that considers both the process and the final result information allowing MH6D to perform robust consistency learning. We conduct experiments under two different training data settings (i.e., only synthetic data and synthetic and real-world data) to verify the generalization ability of MH6D. Extensive experiments on benchmark datasets demonstrate that MH6D achieves state-of-the-art (SOTA) performance, outperforming most data-driven methods even without using any real-world data. The code is available at https://github.com/CNJianLiu/MH6D.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call