Abstract

Establishing reliable correspondences between two images is a fundamental and important task in computer vision. This paper proposes a novel network called Sparse Graph Attention Network (SGA-Net), to capture rich contextual information of sparse graphs for feature matching task. Specifically, a graph attention block is proposed to enhance the representational ability of graph-structured features. The proposed block introduces a novel normalization technique for graph-structured features to embed global information into each edge feature, and it adopts the squeeze-and-excitation mechanism to capture graph-wise contextual information. Meanwhile, to further obtain interesting structural information of sparse graphs, a novel sparse graph transformer is developed based on multi-headed self-attention mechanism, while maintaining permutation-equivariance. Additionally, considering that the graph contexts in shallow layers are not fully exploited, a simple graph-context fusion block is introduced to adaptively capture topological information from different layers by implicitly modeling the interdependence between these graph contexts. The proposed SGA-Net can search dependable candidates among the putative correspondences and simultaneously estimate accurate camera poses for two-view geometry estimation. Extensive experiments on outlier removal and camera pose estimation tasks have demonstrated that the proposed SGA-Net outperforms state-of-the-art methods on both outdoor and indoor benchmarks (i.e., YFCC100M and SUN3D). SGA-Net achieves a mAP5° of 58.88% without RANSAC on the outdoor dataset, and it achieves a precision increase of 13.45% and 7.34% compared with the state-of-the-art result on outdoor and indoor datasets, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call