Abstract

In crowded waters with unknown obstacle motion information, traditional methods often fail to ensure safe and autonomous collision avoidance. To address the challenges of information acquisition and decision delay, this study proposes an optimized autonomous navigation strategy that combines deep reinforcement learning with internal and external rewards. By incorporating random network distillation (RND) with proximal policy optimization (PPO), the interest of autonomous ships in exploring unknown environments is enhanced. Additionally, the proposed approach enables the autonomous generation of intrinsic reward signals for actions. For multi-ship collision avoidance scenarios, an environmental reward is designed based on the International Regulations for Preventing Collision at Sea (COLREGs). This reward system categorizes dynamic obstacles into four collision avoidance situations. The experimental results demonstrate that the proposed algorithm outperforms the popular PPO algorithm by achieving more efficient and safe collision avoidance decision-making in crowded ocean environments with unknown motion information. This research provides a theoretical foundation and serves as a methodological reference for the route deployment of autonomous ships.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call