Abstract

Template matching pose estimation methods based on deep learning have made significant advancements via metric learning or reconstruction learning. Existing approaches primarily build distinct template representation libraries (codebooks) from rendered images for each object, which complicate the training process and increase memory cost for multi-object tasks. Additionally, they struggle to effectively handle discrepancies between the distributions of training and test sets, particularly for occluded objects, resulting in suboptimal matching accuracy. In this study, we propose a shared template representation learning method with augmented semantic features to address these issues. Our method learns representations concurrently using metric and reconstruction learning as similarity constraints, and augments response of network to objects through semantic feature constraints for better generalization performance. Furthermore, rotation matrices serve as templates for codebook construction, leading to excellent matching accuracy compared to rendered images. Notably, it contributes to the effective decoupling of object categories and templates, necessitating the maintenance of only a shared codebook in multi-object pose estimation tasks. Extensive experiments on Linemod, Linemod-Occluded and TLESS datasets demonstrate that the proposed method employing shared templates achieves superior matching accuracy. Moreover, proposed method exhibits robustness on a collected aircraft dataset, further validating its efficacy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call