End-to-end deep reinforcement learning for control of an autonomous underwater robot with an undulating propulsor

Ahmad Aws,Arkadij Yuschenko,Vladimir Soloviev

doi:10.31776/rtcj.12105

Ahmad Aws, Arkadij Yuschenko + Show 1 more

https://doi.org/10.31776/rtcj.12105

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper focuses on the development and implementation of control algorithms for positioning an Autonomous Underwater Vehicle (AUV) with an undulating propulsor, using reinforcement learning methods. It provides an analysis and overview of works incorporating reinforcement learning methods such as Actor-only, Critic-only, and Actor-Critic. The paper primarily focuses on the Deep Deterministic Policy Gradient method and its implementation using deep neural networks to train the Actor-Critic agent. In the agent's architecture, a replay buffer and target neural networks were utilized to address the data correlation issue that induces training instability. An adaptive ar-chitecture was proposed for training the agent to force the robot to move from the initial point to any target point. Additionally, a random target point generator was incorporated at the training stage so as not to retrain the agent when the target points change. The training objective is to optimize the actor's policy by optimizing the critic and maximizing the reward function. Reward function is determined as the distance from the robot's center of mass to the target points. Consequently, the reward received by the agent increases when the robot gets closer to the target point and becomes maximal when the target point is reached with an acceptable error.

Full Text