Multi-Reference Flow-Guided Cross-Domain Reconstruction For General Object 6D Pose Estimation
Estimating the pose of an unseen object is challenging since the pose in object space and the 3D shape are not trainable. Previous methods relied on either template matching with numerous references or Transformer-based local feature matching. However, both approaches focused solely on the appearance of the rendered image and did not consider the object’s shape properties. Therefore, we propose an optical flow-based cross-domain reconstruction method to leverage the object’s geometric features during pose estimation. Additionally, we introduce a cycle-consistent loss to utilize reconstruction self-supervision and a novel deformable aggregator to effectively integrate misaligned features from each reference. Our experiments demonstrate that the proposed method successfully estimates the unseen object’s geometric features and shows competitive performance in general object pose estimation while maintaining fast inference time.