Attention-based multimodal image matching

Aviad Moreshet,Yosi Keller

doi:10.1016/j.cviu.2024.103949

Attention-based multimodal image matching

Aviad Moreshet, Yosi Keller

Open Access

https://doi.org/10.1016/j.cviu.2024.103949

Copy DOI

Journal: Computer Vision and Image Understanding	Publication Date: Feb 7, 2024
Citations: 2

#Multimodal Image Matching #Multimodal Matching + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We propose a method for matching multimodal image patches using a multiscale Transformer-Encoder that focuses on the feature maps of a Siamese CNN. It effectively combines multiscale image embeddings while improving task-specific and appearance-invariant image cues. We also introduce a residual attention architecture that allows for end-to-end training by using a residual connection. To the best of our knowledge, this is the first successful use of the Transformer-Encoder architecture in multimodal image matching. We motivate the use of task-specific multimodal descriptors by achieving new state-of-the-art accuracy on both multimodal and unimodal benchmarks, and demonstrate the quantitative and qualitative advantages of our approach over state-of-the-art unimodal image matching methods in multimodal matching. Our code is shared here: Code.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Computer Vision and Image Understanding

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.