Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding–decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework’s robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc.
Read full abstract