Semantic-guided de-attention with sharpened triplet marginal loss for visual place recognition

Seung-Min Choi,Seung-Ik Lee,Jae-Yeong Lee,In So Kweon

doi:10.1016/j.patcog.2023.109645

Abstract

Thanks to Earth-level Street View images from Google Maps, a visual image geo-localization can estimate the coarse location of a query image with a visual place recognition process. However, this can get very challenging when non-static objects change with time, severely degrading image retrieval accuracy. We address the problem of city-scale visual place recognition in complex urban environments crowded with non-static clutters. To this end, we first analyze what clutters degrade similarity matching between the query and database images. Second, we design a self-supervised trainable de-attention module that prevents the network from focusing on non-static objects in an input image. In addition, we propose a novel triplet marginal loss called sharpened triplet marginal loss to make feature descriptors more discriminative. Lastly, due to the lack of geo-tagged public datasets with a high density of non-static objects, we propose a clutter augmentation method to evaluate our approach. The experimental results show that our model has notably improved over the existing attention methods in geo-localization tasks on the public benchmark datasets and on their augmented versions with high population and traffic. Our code is available at https://github.com/ccsmm78/deattention_with_stml_for_vpr.

Full Text