Path planning is one of the key technologies for autonomous flight of Unmanned Aerial Vehicle. Traditional path planning algorithms have some limitations and deficiencies in the complex and dynamic environment. In this article, we propose a deep reinforcement learning approach for three-dimensional path planning by utilizing the local information and relative distance without global information. UAV can obtain the limited environmental information nearby in the actual scenario with limited sensor capabilities. Therefore, path planning can be formulated as a Partially Observable Markov Decision Process. The recurrent neural network with temporal memory is constructed to address the partial observability problem by extracting crucial information from historical state-action sequences. We develop an action selection strategy that combines the current reward value and the state-action value to reduce the meaningless exploration. In addition, we construct two sample memory pools and propose an adaptive experience replay mechanism based on the frequency of failure. The simulation experiment results show that our method has significant improvements over Deep Q-Network and Deep Recurrent Q-Network in terms of stability and learning efficiency. Our approach successfully plans a reasonable three-dimensional path in the large-scale and complex environment, and has the perfect ability to avoid obstacles.in the unknown environment.
Read full abstract