Abstract

Obstacle avoidance planning has always been an essential technology for Autonomous Underwater Vehicles (AUV) underwater operations. In view of the complex underwater obstacle environment, planning a path that takes into account the optimal calculation efficiency and energy consumption is of great significance to underwater operations. The rapidly random tree <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> (RR$\text{T}^{*}$) method has been proposed in recent years to solve the path optimization problem. Compared with the original rapidly random tree (RRT), RRT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> has the feature of gradual optimization of the path, reducing a lot of useless border memory storage, greatly improving the efficiency of search. Since the RR$\text{T}^{*}$ algorithm is essentially a random sampling extension, the following problems will occur: The goal is so weak that the cost of exploring ineffective areas is high. In this paper, the RRT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> algorithm driven by reinforcement learning (RL-RR$\text{T}^{*}$) is used to reduce the detection cost of invalid areas. This method uses Q-Learning to optimize the random tree expansion process of RRT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> , which not only maintains the random exploratory property of RRT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> in unknown special environments, but also uses Q-Learning to reduce the exploration cost of invalid regions. Specifically, while satisfying the AUVs kinematics model, by setting the reward function of the extended node, the variable probability parameter of the biased target and the dynamic step function, the invalid nodes are reduced, the exploration process is accelerated, and the path planning efficiency is improved. In the simulation experiment, this method is used in two unknown special maze environments. The experimental results show the feasibility of the RL-RRT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> algorithm and the advantages of efficiency and performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call