Abstract

. Image based Geo-localization is estimating the location of a query image by matching it to a large amount of images in Geo-tagged database. This matching task is very challenging due to the vast differences in visual appearance or modality of image pairs on different platforms, e.g., one image from the RGB camera, the other from the light detection and ranging (LIDAR) sensor. The spatial layout of the scene can provide important clues and significantly reduce matching ambiguity. Therefore, we propose a novel deep network that embeds spatial configuration of the scenes into feature representation. Specifically, we design a spatial-scale attention (SSA) module to highlight the salience correspondence layout features at different scales. The encoded features not only represent the emergence of certain objects, but also reflect the relative locations of the objects. By this way, we learn more discriminative deep feature representations, leading to a higher recall. The experimental results on two standard cross-view benchmark datasets (CVUSA and CVACT) and a cross-modal dataset (GRAL) demonstrate that our method performs better than the state-of-the-art methods. Remarkably, the recall rate@top-1 improves from 27.6% in [1] to 40.5% on the GRAL dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.