Abstract

Deep learning-based camera pose regression approaches have achieved outstanding performance for visual indoor localisation. However, these approaches are limited by the availability of images with known camera poses, and they often require a comprehensive mapping of the indoor scenes, which is labour-intensive and often impractical. Recent studies have shown that synthetic images derived from simple 3D building models can be used to train deep learning models to perform cross-domain synthetic-to-real visual localisation. But the performance of such cross-domain localisation models is degraded due to the domain gap between the real and synthetic images. In this study, we propose a domain adaptation approach based on a Generative Adversarial Network (GAN) framework, which is trained on a set of unpaired synthetic and real images to generate real-looking synthetic images and synthetic-looking real images. These images are then used to train the current deep learning-based camera pose regression networks. We develop two domain adaptation strategies for the localisation task, namely, synthetic-to-real and real-to-synthetic image style transfer. The results show that the best model is obtained for the real-to-synthetic style transfer with a localisation error of 0.31 m and orientation error of 3.36°, which is approximately three times lower compared to the previously reported results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call