Articles published on Place Recognition
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1040 Search results
Sort by Recency
- New
- Research Article
- 10.1109/lra.2026.3683589
- Jun 1, 2026
- IEEE Robotics and Automation Letters
- Dejing Zhou + 7 more
PHMRNet: Persistent Homology Based Mamba–RWKV Network for LiDAR Place Recognition
- New
- Research Article
- 10.1016/j.isprsjprs.2026.03.037
- Jun 1, 2026
- ISPRS Journal of Photogrammetry and Remote Sensing
- Xufei Wang + 6 more
Ranking-aware continual learning for LiDAR place recognition
- Research Article
- 10.1016/j.neucom.2026.133112
- May 1, 2026
- Neurocomputing
- Marcos Alfaro + 4 more
Omnidirectional cameras are a suitable and cost-effective choice for Visual Place Recognition (VPR), as they provide comprehensive information from the scene regardless of the robot orientation. However, vision sensors are vulnerable to environmental appearance changes (e.g., illumination, weather, season or moving objects). While multi-modal sensing approaches can overcome these challenges, they introduce significant cost and system complexity. This paper introduces PDPR (Panoramic-Depth Place Recognition), a novel fusion framework that enhances the robustness of VPR methods by integrating visual data with geometric features derived from monocular depth estimation techniques, while using a single-camera setup. In the ablation study, both early and late fusion strategies are evaluated to optimally combine appearance-based and depth-derived features. The extensive evaluation on challenging, indoor and outdoor datasets demonstrates that PDPR consistently boosts retrieval performance across multiple state-of-the-art VPR models. Furthermore, this improvement is achieved without requiring any fine tuning, allowing our method to function as a pluggable module for pretrained models. Consequently, this work presents a powerful, practical and low-cost solution for robust VPR, with high potential to scale as monocular depth estimation and VPR models continue to improve. The project website can be found at https://marcosalfaro.github.io/projects-PDPR/ . • Monocular depth estimation is used to enhance place recognition. • A thorough evaluation of preprocessing techniques to enhance the depth maps. • Fusion techniques are designed to leverage visual and geometric data. • A model-agnostic approach that improves the performance even with no fine tuning. • A robust method across different scenarios and lighting conditions.
- Research Article
- 10.1088/1361-6501/ae61dc
- May 1, 2026
- Measurement Science and Technology
- Wen Hao + 2 more
YawPR-Net: a yaw-robust network for large-scale point cloud based place recognition
- Research Article
- 10.1016/j.knosys.2026.115677
- May 1, 2026
- Knowledge-Based Systems
- Qilong Wu + 7 more
Dual-stage method with memory-augmented embedding learning and attention-guided re-ranking for robust visual place recognition
- Research Article
- 10.1162/jocn.a.2611
- May 1, 2026
- Journal of cognitive neuroscience
- Annie Cheng + 4 more
Humans often rely on environmental boundaries for place recognition and navigation. However, what defines an effective boundary for human cortical scene processing remains unclear. Despite the prominent use of extended surfaces (e.g., walls) as environmental boundaries in the literature, some evidence suggests that effective boundaries may instead be qualified by their ability to mark the 3D shape of a local environment. In this study, we directly test this possibility using tightly controlled, artificial images of boundaries that define the same 3D shape of a local environment, systematically manipulating the number of poles to vary the internal structure of these boundaries. We hypothesize that if the human cortical scene-processing system encodes boundaries based on the geometric shape marked by boundary elements rather than surface continuity, it will represent geometrically equivalent boundaries similarly, regardless of whether a boundary is made up of wall surfaces or a varying number of poles. Using fMRI, we found that even a few isolated poles marking the vertices of a local 3D space were sufficient to elicit a wall-like representation in the parahippocampal place area, revealing its sensitivity to environmental shape composed of non-wall boundaries. The occipital place area was sensitive to graded variations in boundary structure, tracking continuous surface-like properties. Together, these results reveal neural sensitivity to non-wall boundaries in the human scene-selective cortical system and shed light on the distinct boundary features that support the encoding of environmental geometry across different cortical regions.
- Research Article
- 10.1088/2631-8695/ae68dd
- May 1, 2026
- Engineering Research Express
- Junwei Fu + 4 more
Abstract High-precision train self-localization is a crucial component for enabling autonomous railway operations. Conventional train localization methods heavily rely on trackside infrastructure. This paper proposes a Visual Virtual Balise (VVB) system based on visual place recognition, which replaces physical balises with camera-based visual landmarks. By storing compact global descriptors instead of raw images, the proposed system satisfies onboard storage constraints and real-time retrieval requirements, enabling infrastructure-free train localization. Furthermore, to address the strong linear structure and high visual repetition inherent in railway scenes, we propose RailVLAD, a railway-specific trainable feature aggregation network. By fusing multi-layer convolutional features, RailVLAD improves discriminative power while preserving a lightweight architecture suitable for onboard deployment. Experiments at the China National Railway Track Test Center show that the method achieves 93.68% retrieval accuracy, surpassing SIFT, NetVLAD, and VGG16 by 11.92%, 11.87%, and 6.04%, respectively. Thesystem achieves a positioning accuracy of 1.88 m (error <2 m), operates at 12 FPS on Jetson AGX Xavier, and can be integrated with multi-sensor localization frameworks.
- Research Article
- 10.1016/j.ecoinf.2026.103780
- May 1, 2026
- Ecological Informatics
- Judith Vilella-Cantos + 4 more
Low cost, high efficiency: LiDAR place recognition in vineyards with Matryoshka Representation Learning
- Research Article
- 10.3390/s26092799
- Apr 30, 2026
- Sensors (Basel, Switzerland)
- Yu-Hong Jian + 1 more
EigenPlaces is a state-of-the-art visual place recognition (VPR) method that constructs training classes via SVD-based focal points, where a fixed focal distance D controls how far the focal point is placed from each cell center. However, this globally fixed D cannot adapt to the diverse scene geometries encountered across different urban environments. In this work, we systematically analyze the sensitivity of D across multiple benchmark datasets and reveal that the optimal D value is highly dataset-dependent, with performance gaps of up to 4.4 percentage points between the best and worst D choices. We then propose a depth-aware adaptive D strategy that leverages monocular depth estimation to compute per-cell focal distances, combined with quantile mapping to ensure sufficient variance in the assigned D values. By establishing a principled connection between visual sensor data and geometric training supervision, our method enhances the environmental perception reliability of intelligent sensing platforms. Experiments on three benchmarks (Pitts30k, AmsterTime, SF-XL) validate the dataset-dependent nature of D and confirm that our depth-aware approach achieves the best same-distribution performance among all tested configurations. We further conduct a multi-strategy ablation comparing depth raw, depth quantile, and SVD eigenvalue ratio approaches, providing practical guidance for adaptive focal distance selection in VPR training pipelines.
- Research Article
- 10.3390/electronics15091810
- Apr 24, 2026
- Electronics
- Jaehun Kim + 1 more
Many place recognition approaches, which identify previously visited places or locations by matching current sensory data, such as 2D RGB images and 3D point clouds, have been proposed to achieve accurate and robust localization and loop closure detection in global positioning system (GPS)-denied environments. Since visual place recognition (VPR) methods that rely on images captured by camera sensors are highly sensitive to variations in appearance, including changes in lighting, surface color, and shadows, they can lead to poor place recognition accuracy. In contrast, light detection and ranging (LiDAR)-based place recognition (LPR) approaches based on 3D point cloud data that captures the shape and geometric structure of the environment are robust to changes in place appearance and can therefore provide more reliable place recognition results than VPR methods. This work presents an indoor LPR method called PointNetVLAD-based indoor pedestrian localization (PIPL). PIPL is a deep network model that uses PointNetVLAD to learn to extract global descriptors from 3D LiDAR point cloud data. PIPL can recognize places previously visited by a pedestrian using point clouds captured by a low-cost LiDAR sensor on a smartphone in small-scale indoor environments, while PointNetVLAD performs place recognition for vehicles using high-cost LiDAR, GPS, and inertial measurement unit (IMU) sensors in large-scale outdoor areas. For place recognition on 3D point cloud reference maps generated from LiDAR scans, PointNetVLAD exploits the universal transverse mercator (UTM) coordinate system based on GPS and IMU measurements, whereas PIPL uses a virtual coordinate system designed in this study due to the unavailability of GPS indoors. In experiments conducted in campus buildings, PIPL shows significant advantages over NetVLAD (known as a convolutional neural network (CNN)-based VPR method). Particularly in indoor environments with repetitive scenes where geometric structures are preserved and image-based appearance features are sparse or unclear, PIPL achieved 39% higher top-1 accuracy and 10% higher top-3 accuracy compared to NetVLAD. Furthermore, PIPL achieved place recognition accuracy comparable to NetVLAD even with a small number of points in a 3D point cloud and outperformed NetVLAD even with a smaller model training dataset. The experimental results also indicate that PIPL requires over 76% less place retrieval time than NetVLAD while maintaining robust place classification performance.
- Research Article
- 10.1016/j.eswa.2025.130756
- Apr 1, 2026
- Expert Systems with Applications
- Zhenyu Li + 1 more
Quadruplet-attention transformer for scale-invariant robot place recognition
- Research Article
- 10.1016/j.neunet.2026.109001
- Apr 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Hang Yang + 5 more
Discriminative region learning for point cloud-based place recognition.
- Research Article
- 10.1016/j.robot.2025.105315
- Apr 1, 2026
- Robotics and Autonomous Systems
- Roun Lee + 2 more
Terrain-based place recognition for LiDAR SLAM of quadruped robots with limited field-of-view measurements
- Research Article
1
- 10.2174/011570162x379049251119103646
- Mar 9, 2026
- Current HIV research
- Robert Lalonde + 1 more
Infection with the human immunodeficiency virus (HIV) causes human neuropsycho-logical disorders, such as apathy and hypokinesia, as well as deficits in motor skills, selective attention, and learning. Based on findings from multiple studies, similar signs have been repro-duced in animal models following intracerebral injections of HIV-infected human monocytes or monocyte-derived macrophages, or exposure to gp120, Tat, or Nef. These include learning defi-cits in Morris, radial arm, and Barnes mazes; impairments in novel object place and shape recog-nition; motor coordination deficits on stationary and mobile beams; and hypoactivity. Relative to non-transgenic controls, deficits in most tests have also been reproduced in transgenic mice or rats expressing HIV-1 or related proteins. There is evidence that corticosterone contributes to these behavioral abnormalities, which may have implications for treating AIDS dementia com-plex, given its ability to exacerbate the neurotoxic effects of gp120 in tissue cultures. Possible mechanisms include corticosterone-induced worsening of lipid peroxidation, inhibition of aspar-tate uptake, increased calcium mobilization, and reduced ATP levels.
- Research Article
- 10.3390/s26051561
- Mar 2, 2026
- Sensors (Basel, Switzerland)
- Wen Liu + 3 more
Place recognition is a fundamental challenge for robotics and autonomous vehicles. While visual place recognition has achieved high precision, cross-modal place recognition-specifically, visual localization within large-scale point cloud maps-remains a formidable problem. Existing methods often struggle with the significant domain gap between modalities and can be computationally prohibitive, especially those processing raw 3D point clouds. Furthermore, they frequently fail to learn features invariant to viewpoint and scale variations, limiting generalization to unseen environments. In this paper, we formulate cross-modal recognition as a problem of learning a scale-invariant, unified embedding space. Our framework employs a hierarchical Swin Transformer to extract multi-scale features from unified 2D representations of both modalities. The central principle of our method is a multi-scale self-distillation paradigm, which recasts feature learning as an intra-modal knowledge transfer task. Specifically, the coarse-scale "teacher" features provide supervision for the fine-scale "student" features. The final inter-modal alignment is then achieved via a global contrastive loss, exclusively leveraging the semantically rich "teacher" embeddings to ensure a reliable and discriminative matching. Extensive experiments on the KITTI and KITTI-360 datasets demonstrate that our method achieves state-of-the-art performance. Notably, using only the KITTI-trained model without fine-tuning, Recall@1 exceeds 60% on all evaluable sequences of KITTI-360 at a 10 m threshold. Code and pre-trained models will be made publicly available upon acceptance.
- Research Article
- 10.1016/j.neucom.2026.133399
- Mar 1, 2026
- Neurocomputing
- Chenxu Wang + 3 more
Visual Place Recognition (VPR) is essential for robotics and autonomous navigation, yet most methods rely on heavy task-specific training. Existing approaches fall into two main paradigms: single-stage models that learn compact global descriptors, and two-stage pipelines that combine coarse global retrieval with local feature or geometric verification. While effective, both require large annotated datasets and carefully tuned optimization, limiting scalability and cross-domain reuse. We introduce TF-VPR, a new benchmark that tackles a more challenging setting: VPR performed entirely without additional training, where descriptors are generated, refined and matched only at test time. Enabled by recent Vision Foundation Models (VFMs), TF-VPR systematically evaluates how far pretrained VFMs can be pushed for place recognition when used as-is, and provides a standardized protocol for fairly comparing arbitrary VFMs without fine-tuning. To support this, we unify major VPR datasets covering diverse real-world conditions and propose two lightweight, training-free modules: Training-Free Graph-Attention Graph Module (TF-GAM) and Training-Free Cross-Attention Module (TF-CAM). These plug-and-play modules enhance descriptor discriminability and retrieval robustness. Experiments show that TF-VPR exposes new challenges and reveals previously unexplored strengths of VFMs for training-free place recognition. Code and datasets are available at https://github.com/ddfs430/TF-VPR .
- Research Article
- 10.1109/lra.2026.3653332
- Mar 1, 2026
- IEEE Robotics and Automation Letters
- Daniel Casado Herraez + 5 more
Localization of autonomous vehicles in existing maps is crucial for reliable navigation. Using previously constructed maps allows vehicles to estimate their pose without the inherent odometry drift. Building such maps involves aligning data recorded at different times and maintaining the map over time. While LiDAR sensors are commonly used for mapping due to their high accuracy, they are sensitive to adverse weather and involve high production costs. In this paper, we address the problem of long-term mapping and localization leveraging automotive radars, which are robust to weather conditions and offer a cost-effective alternative to LiDARs. In our approach, we construct maps of coinciding areas and align them by performing place recognition between them. Additionally, ourmulti-sequence loop detection and verification strategy for radar sensors is able to filter incorrect loop matches, enhancing trajectory alignment. Then, our novel map maintenance module handles radar noise and preserves persistent map points that remain reliable for localization. Subsequently, we estimate the robot poses in the resulting map by combining local odometry with scan-to-map matching, overcoming the complexities of sparse automotive radar data. We evaluate our method on public automotive radar datasets. The results show that our approach achieves state-of-the-art trajectory alignment, preserves persistent map points for localization, and reliably localizes within the constructed maps. The project page of the paper is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://www.ipb.uni-bonn.de/html/projects/long-term-localization-and-mapping</uri>.
- Research Article
- 10.1016/j.measurement.2025.120174
- Mar 1, 2026
- Measurement
- Jiakai Lu + 3 more
Dome: fast and robust LiDAR place recognition via spherical three-view feature fusion
- Research Article
3
- 10.1109/tpami.2025.3629287
- Mar 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Feng Lu + 6 more
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. In our previous work, we propose a novel method to realize seamless adaptation of foundation models to VPR (SelaVPR). This method can produce both global and local features that focus on discriminative landmarks to recognize places for two-stage VPR by a parameter-efficient adaptation approach. Although SelaVPR has achieved competitive results, we argue that the previous adaptation is inefficient in training time and GPU memory usage, and the re-ranking paradigm is also costly in retrieval latency and storage usage. In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++. Concretely, we first design a parameter-, time-, and memory-efficient adaptation method that uses lightweight multi-scale convolution (MultiConv) adapters to refine intermediate features from the frozen foundation backbone. This adaptation method does not back-propagate gradients through the backbone during training, and the MultiConv adapter facilitates feature interactions along the spatial axes and introduces proper local priors, thus achieving higher efficiency and better performance. Moreover, we propose an innovative re-ranking paradigm for more efficient VPR. Instead of relying on local features for re-ranking, which incurs huge overhead in latency and storage, we employ compact binary features for initial retrieval and robust floating-point (global) features for re-ranking. To obtain such binary features, we propose a similarity-constrained deep hashing method, which can be easily integrated into the VPR pipeline. Finally, we improve our training strategy and unify the training protocol of several common training datasets to merge them for better training of VPR models. Extensive experiments show that SelaVPR++ is highly efficient in training time, GPU memory usage, and retrieval latency (6000× faster than TransVPR), as well as outperforms the state-of-the-art methods by a large margin (ranks 1st on MSLS challenge leaderboard).
- Research Article
1
- 10.1016/j.neucom.2025.132539
- Mar 1, 2026
- Neurocomputing
- Shanshan Wan + 5 more
SciceVPR: Stable cross-image correlation enhanced model for visual place recognition