Abstract

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Highlights

  • IntroductionVisual-based localization is an alternative localization method that can be positioned using just a pre-built model and a single camera without any other external devices required

  • Researchers employ indoor signal transceiving devices such as Bluetooth beacons [3], Wireless Fidelity (Wi-Fi) [4], Digital Enhance Cordless Telephone (DECT) [5], and Radio Frequency Identification (RFID) [6]; these external devices are required to be placed in the environment in advance, resulting in additional installation and maintenance costs

  • This can typically be accomplished by direct image-based localization via a 3D model based on sparse feature points from simultaneous localization and mapping (SLAM) [9,10] or Structure from Motion (SfM) [11]

Read more

Summary

Introduction

Visual-based localization is an alternative localization method that can be positioned using just a pre-built model and a single camera without any other external devices required. Vision-based localization can be applied to 3 Degree of Freedom (3DoF) positioning services, as well as additional applications such as intelligent robots [7] and virtual reality [8] that require high-precision 6DoF pose estimation. This can typically be accomplished by direct image-based localization via a 3D model based on sparse feature points from simultaneous localization and mapping (SLAM) [9,10] or Structure from Motion (SfM) [11]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call