Abstract

This paper is concerned with the problem of using deep reinforcement learning methods to enable safe navigation of UAVs in unknown large-scale environments containing a large number of obstacles. The ability of UAV to navigate autonomously is a precondition for performing disaster rescue and logistics deliveries. In this paper, the perception-constrained navigation of UAV is modeled as a partially observable Markov decision process (POMDP), and a fast recurrent stochastic valued gradient algorithm based on the actot-critic framework is designed for online solution. Compared with traditional SLAM-based and reactive obstacle avoidance-based approaches, the proposed navigation method can map the raw sensing information into navigation signals, and the learned policy deployed to the UAV can avoid the memory occupation and computational consumption required for real-time map building. Through experiments in a newly designed simulation environment, it is demonstrated that the proposed algorithm can enable the UAV to navigate safely in unknown cluttered large-scale environments and outperform the state-of-the-art DRL algorithm in terms of performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call