Abstract

In this paper, we consider the problem of unmanned aerial vehicle (UAV) path planning. The traditional path planning algorithm has the problems of low efficiency and poor adaptability, so this paper uses the reinforcement learning algorithm to complete the path planning. The classic proximal policy optimization (PPO) algorithm has problems that the samples with large rewards in the experience replay buffer will seriously affect training, this situation causes the agent’s exploration performance degradation and the algorithm has poor convergence in some path planning tasks. To solve these problems, this paper proposes a frequency decomposition-PPO algorithm (FD-PPO) based on the frequency decomposition and designs a heuristic reward function to solve the UAV path planning problem. The FD-PPO algorithm decomposes rewards into multi-dimensional frequency rewards, then calculate the frequency return to efficiently guide UAV to complete the path planning task. The simulation results show that the FD-PPO algorithm proposed in this paper can adapt to the complex environment, and has outstanding stability under the continuous state space and continuous action space. At the same time, the FD-PPO algorithm has better performance in path planning than the PPO algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call