On the performance of different Deep Reinforcement Learning based controllers for the path-following of a ship

Sivaraman Sivaraj,Awanish Dubey,Suresh Rajendran

doi:10.1016/j.oceaneng.2023.115607

Sivaraman Sivaraj, Awanish Dubey + Show 1 more

https://doi.org/10.1016/j.oceaneng.2023.115607

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

A set of continuous state-action space-based deep reinforcement learning algorithms are used for the path following of a ship in calm water and waves. The mathematical model of a KVLCC2 tanker represents the ship dynamics. The mathematical model includes the hull force, rudder force, propulsion force, and external wave forces. Look ahead distance-based guidance algorithm called Line of Sight (LOS) is used for computing the Cross Track Error (CTE) and Heading Error (HE). The reward function is designed based on HE and CTE. The created Environment is trained with four different Deep Reinforcement Learning (DRL) agents named Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradients (DDPG), Twin-Delayed Deep Deterministic Policy Gradients (TD3), and Soft-Actor Critic (SAC). Common Neural Network architecture is used for all four agents. Yaw rate, HE, and CTE serve as input to the Neural Network, and the rudder deflection rate (δ°) corresponds to the action space (output). Computation time, average cross-track error, and rudder actuation are computed and compared for path-following scenarios. DDPG performs better with a minimum average CTE for all the simulated cases. However, SAC demands minimum rudder control effort to achieve the tasks. Finally, the trained agents are validated using Hardware In-Loop (HIL) simulation.

Full Text