Abstract

A set of continuous state-action space-based deep reinforcement learning algorithms are used for the path following of a ship in calm water and waves. The mathematical model of a KVLCC2 tanker represents the ship dynamics. The mathematical model includes the hull force, rudder force, propulsion force, and external wave forces. Look ahead distance-based guidance algorithm called Line of Sight (LOS) is used for computing the Cross Track Error (CTE) and Heading Error (HE). The reward function is designed based on HE and CTE. The created Environment is trained with four different Deep Reinforcement Learning (DRL) agents named Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradients (DDPG), Twin-Delayed Deep Deterministic Policy Gradients (TD3), and Soft-Actor Critic (SAC). Common Neural Network architecture is used for all four agents. Yaw rate, HE, and CTE serve as input to the Neural Network, and the rudder deflection rate (δ°) corresponds to the action space (output). Computation time, average cross-track error, and rudder actuation are computed and compared for path-following scenarios. DDPG performs better with a minimum average CTE for all the simulated cases. However, SAC demands minimum rudder control effort to achieve the tasks. Finally, the trained agents are validated using Hardware In-Loop (HIL) simulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call