Place recognition (PR) plays a crucial role in simultaneous localization and mapping. Unfortunately, however, changes in viewpoints and conditions in large-scale environments impose tricky challenges for PR. To this end, this paper specifically proposes an explicit points-of-interest driven PR method, which consists of a road segmentation module based on grid-wise patch U-Transformer (GP_UT) and a PR module based on regions of interest Siamese Transformer NetVLAD (RI_STV). Especially for RI_STV, in the individual dimension, it is dedicated to exploring the local topological features of non-road regions of interest. In the spatial dimension, an improved Transformer is introduced to capture the global interactions between features of interest. In the cluster dimension, NetVLAD embedded with weighted pooling is created to perform weighted aggregation of feature clusters to generate discriminative and general descriptors. Evaluation on various datasets shows that our customized method is not only impressively competitive, but also strikes the best balance between accuracy and real-time performance.
Read full abstract