Abstract

Local feature matching plays a vital role in various computer vision tasks. In this work, we present a novel network that combines feature matching and outlier rejection for finding reliable correspondences between image pairs. The proposed method is a hybrid transformer-based graph neural network (GNN), termed HTMatch, which aims to achieve high accuracy and efficient feature matching. Specifically, we first propose a hybrid transformer that integrates self- and cross-attention together to condition the feature descriptors between image pairs. By doing so, the intra/inter-graph attentional aggregation can be realized by a single transformer layer, which achieves more efficient message passing. Then, we introduce a new spatial embedding module to enhance the spatial constraints across images. The spatial information from one image is embedded into another, which can significantly improve matching performance. Finally, we adopt a seeded GNN architecture for establishing a sparse graph, which improves both efficiency and effectiveness. Experiments show that HTMatch reaches state-of-the-art results on several public benchmarks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call