Proximal policy optimization with reciprocal velocity obstacle based collision avoidance path planning for multi-unmanned surface vehicles

Delai Xue,Defeng Wu,Andre S Yamashita,Zhixiong Li

doi:10.1016/j.oceaneng.2023.114005

Abstract

The challenge of solving the collision avoidance path planning problem lies in adaptively selecting the optimal agent velocity in complex scenarios full of reciprocal obstacles. To achieve autonomous collision avoidance for unmanned surface vehicles (USVs), a distributed multi-USVs navigation method based on deep reinforcement learning (DRL) is proposed, which combines the concept of reciprocal velocity obstacle (RVO) with a DRL scheme to solve the collision avoidance path planning problem with limited information. The collision avoidance behavior in USV navigation is based on the International Regulations for Preventing Collisions at Sea, and to obtain an improved proximal policy optimization (PPO) algorithm, the RVO algorithm is used to improve the action space and reward function of the PPO algorithm. Gate Recurrent Unit-based neural network is used to map the states of different number of surrounding obstacles directly into actions. The effectiveness of the method in various situations was verified through simulation experiments. It is also shown that the algorithm can accurately determine collision situations, give reasonable collision avoidance behaviors, and achieve effective collision avoidance in complex environments with dynamic and static obstacles. The research can provide a theoretical basis and methodological reference for USV autonomous navigation.

Full Text