Abstract

Deep learning (DL) based localization and Simultaneous Localization and Mapping (SLAM) has recently gained considerable attention demonstrating remarkable results. Instead of constructing hand-crafted algorithms through geometric theories, DL based solutions provide a data-driven solution to the problem. Taking advantage of large amounts of training data and computing capacity, these approaches are increasingly developing into a new field that offers accurate and robust localization systems. In this work, the problem of global localization for unmanned aerial vehicles (UAVs) is analyzed by proposing a sequential, end-to-end, and multimodal deep neural network based monocular visual-inertial localization framework. More specifically, the proposed neural network architecture is three-fold; a visual feature extractor convNet network, a small IMU integrator bi-directional long short-term memory (LSTM), and a global pose regressor bi-directional LSTM network for pose estimation. In addition, by fusing the traditional IMU filtering methods instead of LSTM with the convNet, a more time-efficient deep pose estimation framework is presented. It is worth pointing out that the focus in this study is to evaluate the precision and efficiency of visual-inertial (VI) based localization approaches concerning indoor scenarios. The proposed deep global localization is compared with the various state-of-the-art algorithms on indoor UAV datasets, simulation environments and real-world drone experiments in terms of accuracy and time-efficiency. In addition, the comparison of IMU-LSTM and IMU-Filter based pose estimators is also provided by a detailed analysis. Experimental results show that the proposed filter-based approach combined with a DL approach has promising performance in terms of accuracy and time efficiency in indoor localization of UAVs.

Highlights

  • Unmanned Aerial Vehicles (UAVs) are capable of completing a wide range of applications, such as tracking, facility inspection, supply distribution, mapping, etc

  • It is worth pointing out that the focus in this study is to evaluate the precision and efficiency of visual-inertial (VI) based localization approaches with respect to indoor scenarios, by comparing visual inertial odometry (VIO) algorithms, including direct-based and feature-based and, in particular, data-driven-based algorithms (DL approaches)

  • We propose a multimodal recurrent neural network (RNN) architecture that allows the Deep learning (DL)-based global localization to be generalized to unseen and new environments via the visual and temporal feature extraction trained on convolutional neural networks (CNN) and long short-term memory (LSTM)

Read more

Summary

Introduction

Unmanned Aerial Vehicles (UAVs) are capable of completing a wide range of applications, such as tracking, facility inspection, supply distribution, mapping, etc. A precise estimation of the UAVs pose is essential to ensure a high level of safety in autonomous operations. The capability of an autonomous agent to accurately estimate its pose is known as Localization in mobile robotics [1], [2] and global localization is its ability to retrieve its global pose in a known scene with prior knowledge [3]. Among several localization methods available, the community has drawn considerable attention to camera-based solutions [4] due to their low-cost, portability, simple hardware set-up and ability to give rich information about the scene. The ORB-SLAM [11] and the DSO [9], two members of feature-based [11]–[13] and direct-based VO [9], [14], [15] methods, respectively, achieve real-time output on CPUs and are both extremely accurate in the normal large-scale environment

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call