Abstract

Camera-based indoor localization is a fundamental aspect of indoor navigation, virtual reality, and location-based services. Deep learning methods have exhibited remarkable performance with low storage requirements and high efficiency. However, existing methods mainly derive features implicitly for pose regression without considering explicit structure information from images. This paper proposes that incorporating such information can improve the localization performance of learning-based approaches. We extract structure information from RGB images in the form of depth maps and edge maps and design two modules for depth-weighted and edge-weighted feature fusion. These modules are integrated into the pose regression network to enhance pose prediction. Furthermore, we employ a self-attention module for high-level feature extraction to augment the network capacity. Extensive experiments are conducted on the publicly available 7Scenes and 12Scenes datasets, and the results demonstrate that the proposed method achieves high localization performance, with an average positional error of 0.19m and 0.16m, respectively. The code for this work is available at https://github.com/lqing900205/structureLoc.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call