Abstract

ABSTRACT Image-based relocalization is a renewed interest in outdoor environments, because it is an important problem with many applications. PoseNet introduces Convolutional Neural Network (CNN) for the first time to realize the real-time camera pose solution based on a single image. In order to solve the problem of precision and robustness of PoseNet and its improved algorithms in complex environment, this paper proposes and implements a new visual relocation method based on deep convolutional neural networks (VNLSTM-PoseNet). Firstly, this method directly resizes the input image without cropping to increase the receptive field of the training image. Then, the image and the corresponding pose labels are put into the improved Long Short-Term Memory based (LSTM-based) PoseNet network for training and the network is optimized by the Nadam optimizer. Finally, the trained network is used for image localization to obtain the camera pose. Experimental results on outdoor public datasets show our VNLSTM-PoseNet can lead to drastic improvements in relocalization performance compared to existing state-of-the-art CNN-based methods.

Highlights

  • Image-based camera relocalization is a basic problem in many computer vision applications, such as autonomous vehicle driving, mobile robots, Augmented Reality (AR), pedestrian visual positioning, Structure from Motion (SfM) (Li et al 2020b; Tateno et al 2017; Asadi et al 2019; Liu et al 2020; Acharya et al 2019a; Niu et al 2019), and so on

  • We presented a new deep ConvNet learning architecture that address the big challenge of image-based camera relocalization in urban streets from only RGB images

  • In order to obtain more suitable deep ConvNet hyperparameters, the Nadam optimizer is used to optimize the network based on the Pytorch framework

Read more

Summary

Introduction

Image-based camera relocalization is a basic problem in many computer vision applications, such as autonomous vehicle driving, mobile robots, Augmented Reality (AR), pedestrian visual positioning, Structure from Motion (SfM) (Li et al 2020b; Tateno et al 2017; Asadi et al 2019; Liu et al 2020; Acharya et al 2019a; Niu et al 2019), and so on. It refers to estimating the camera’s pose, that is position, and orientation, according to the image. Various complex situations that may exist in the real environments, such as object occlusion, viewpoint changes, motion blur, illumination changes, and lack of texture, may affect feature matching and make it difficult to obtain accurate camera poses or successful positioning

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.