Learning for mismatch removal via graph attention networks

Xingyu Jiang,Yang Wang,Aoxiang Fan,Jiayi Ma

doi:10.1016/j.isprsjprs.2022.06.009

Abstract

Recovering camera pose from two-view images is a critical problem in photogrammetry and computer vision. For complex scenarios, point correspondences that are constructed by off-the-shelf feature matcher such as SIFT, would be corrupted by heavy outliers. In this case, traditional sampling consensus- or motion/geometrical coherence-based methods would suffer a lot from ensuring their assumptions. To this end, we propose a deep technique to better extract underlying geometry information from high-dimensional feature space for two-view geometry estimation. Unlike existing deep methods that use distribution-based normalization or explicitly aggregate neighboring correspondences, we propose a graph attention operation with multi-head mechanism, termed as GANet, to latently capture fine-grain contextual/geometrical relations among these corrupted correspondences. This encourages our network to learn informative representation for ensuring high graph similarity thus focusing more on inliers and restraining outliers. On this basis, our network can more easily infer inliers that are best to recover camera pose. Moreover, we also observe that the calculation of graph similarity for each node is only supported by partial node features. In this regard, we further propose a lightweight implementation for graph attention, namely Sparse GANet, which is performed by learning a sparse attention map based on block-wise operation and Sinkhorn normalization. This sparse strategy can largely reduce the memory and computational requests while maintaining the performance. Extensive experiments of pose estimation, outlier rejection and image registration on different challenging datasets, and combinational tests with different descriptor matchers and robust estimators, demonstrate the superiority and great generalization of our method against the state-of-the-art. In particular, we achieve at least 1.5% and 0.6% mAP(%)@5° enhancement on YFCC and SUN3D data for pose estimation, respectively. And our sparse GANet can reduce the model size to only 0.28 MB and the time cost to 16 ms, which is significant superior than SuperGlue that requires 12.02 MB and 68 ms. (Source code is available at https://github.com/StaRainJ/Code-of-GANet.)

Full Text