Abstract
Deep learning (DL) based localization and Simultaneous Localization and Mapping (SLAM) has recently gained considerable attention demonstrating remarkable results. Instead of constructing hand-crafted algorithms through geometric theories, DL based solutions provide a data-driven solution to the problem. Taking advantage of large amounts of training data and computing capacity, these approaches are increasingly developing into a new field that offers accurate and robust localization systems. In this work, the problem of global localization for unmanned aerial vehicles (UAVs) is analyzed by proposing a sequential, end-to-end, and multimodal deep neural network based monocular visual-inertial localization framework. More specifically, the proposed neural network architecture is three-fold; a visual feature extractor convNet network, a small IMU integrator bi-directional long short-term memory (LSTM), and a global pose regressor bi-directional LSTM network for pose estimation. In addition, by fusing the traditional IMU filtering methods instead of LSTM with the convNet, a more time-efficient deep pose estimation framework is presented. It is worth pointing out that the focus in this study is to evaluate the precision and efficiency of visual-inertial (VI) based localization approaches concerning indoor scenarios. The proposed deep global localization is compared with the various state-of-the-art algorithms on indoor UAV datasets, simulation environments and real-world drone experiments in terms of accuracy and time-efficiency. In addition, the comparison of IMU-LSTM and IMU-Filter based pose estimators is also provided by a detailed analysis. Experimental results show that the proposed filter-based approach combined with a DL approach has promising performance in terms of accuracy and time efficiency in indoor localization of UAVs.
Highlights
Unmanned Aerial Vehicles (UAVs) are capable of completing a wide range of applications, such as tracking, facility inspection, supply distribution, mapping, etc
It is worth pointing out that the focus in this study is to evaluate the precision and efficiency of visual-inertial (VI) based localization approaches with respect to indoor scenarios, by comparing visual inertial odometry (VIO) algorithms, including direct-based and feature-based and, in particular, data-driven-based algorithms (DL approaches)
We propose a multimodal recurrent neural network (RNN) architecture that allows the Deep learning (DL)-based global localization to be generalized to unseen and new environments via the visual and temporal feature extraction trained on convolutional neural networks (CNN) and long short-term memory (LSTM)
Summary
Unmanned Aerial Vehicles (UAVs) are capable of completing a wide range of applications, such as tracking, facility inspection, supply distribution, mapping, etc. A precise estimation of the UAVs pose is essential to ensure a high level of safety in autonomous operations. The capability of an autonomous agent to accurately estimate its pose is known as Localization in mobile robotics [1], [2] and global localization is its ability to retrieve its global pose in a known scene with prior knowledge [3]. Among several localization methods available, the community has drawn considerable attention to camera-based solutions [4] due to their low-cost, portability, simple hardware set-up and ability to give rich information about the scene. The ORB-SLAM [11] and the DSO [9], two members of feature-based [11]–[13] and direct-based VO [9], [14], [15] methods, respectively, achieve real-time output on CPUs and are both extremely accurate in the normal large-scale environment
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.