Abstract
Learning based feature matching methods have been commonly studied in recent years. The core issue for learning feature matching is to how to learn (1) discriminative representations for feature points (or regions) within each intra-image and (2) consensus representations for feature points across inter-images. Recently, self- and cross-attention models have been exploited to address this issue. However, in many scenes, features are coming with large-scale, redundant and outliers contaminated. Previous self-/cross-attention models generally conduct message passing on all primal features which thus lead to redundant learning and high computational cost. To mitigate limitations, inspired by recent seed matching methods, in this paper, we propose a novel efficient Anchor Matching Transformer (AMatFormer) for the feature matching problem. AMatFormer has two main aspects: First, it mainly conducts self-/cross-attention on some anchor features and leverages these anchor features as message bottleneck to learn the representations for all primal features. Thus, it can be implemented efficiently and compactly. Second, AMatFormer adopts a shared FFN module to further embed the features of two images into the common domain and thus learn the consensus feature representations for the matching problem. Experiments on several benchmarks demonstrate the effectiveness and efficiency of the proposed AMatFormer matching approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.