Abstract

This paper is concerned with the problem of using deep reinforcement learning methods to enable safe navigation of UAVs in unknown large-scale environments containing a large number of obstacles. The ability of UAV to navigate autonomously is a precondition for performing disaster rescue and logistics deliveries. In this paper, the perception-constrained navigation of UAV is modeled as a partially observable Markov decision process (POMDP), and a fast recurrent stochastic valued gradient algorithm based on the actot-critic framework is designed for online solution. Compared with traditional SLAM-based and reactive obstacle avoidance-based approaches, the proposed navigation method can map the raw sensing information into navigation signals, and the learned policy deployed to the UAV can avoid the memory occupation and computational consumption required for real-time map building. Through experiments in a newly designed simulation environment, it is demonstrated that the proposed algorithm can enable the UAV to navigate safely in unknown cluttered large-scale environments and outperform the state-of-the-art DRL algorithm in terms of performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.