Abstract

Visual localization, a vital component of many visual applications, has been tackled by scene coordinates regression (SCoRe) methods that leverage neural networks to predict scene coordinates, followed by a PnP algorithm to recover camera pose. However, these methods do not consider the relationship between image patches, known as relative features or affinity information, which is instrumental for network to perform complete scene parsing. Besides, owing to the visual similarity between image patches, these methods are weak in extracting reliable absolute features that represent the context information of the image patches, resulting in inferior localization performance. In response, we propose EAAINet that is based on classical SCoRe approaches and consists of two novel modules: the Global Affinity Aggregation Module (GAAM) and the Element-wise Attention Module (EAM). Specifically, GAAM employs an interval sampling strategy to sample image patches to construct sparse graph neural networks (GNNs), from which global affinity information between image patches is retrieved, hence ensuring precise scene parsing. EAM integrates multi-level features to generate reliable absolute features to regress accurate scene coordinates, with the key insight that the structure information is essential to differentiate similar image patches and the semantic information assists in modeling regression problems. Technically, EAM predicts element-wise soft attention masks to reconcile multi-level feature maps, enabling efficient feature fusion. Positional encoding and uncertainty modeling are also employed to enhance visual localization performance. Experimental results show that EAAINet significantly outperforms the state-of-the-arts on multiple benchmarks with faster speed and less model parameters. Source code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/DK-HU/EAAINet</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call