Abstract
This paper focuses on the development and implementation of control algorithms for positioning an Autonomous Underwater Vehicle (AUV) with an undulating propulsor, using reinforcement learning methods. It provides an analysis and overview of works incorporating reinforcement learning methods such as Actor-only, Critic-only, and Actor-Critic. The paper primarily focuses on the Deep Deterministic Policy Gradient method and its implementation using deep neural networks to train the Actor-Critic agent. In the agent's architecture, a replay buffer and target neural networks were utilized to address the data correlation issue that induces training instability. An adaptive ar-chitecture was proposed for training the agent to force the robot to move from the initial point to any target point. Additionally, a random target point generator was incorporated at the training stage so as not to retrain the agent when the target points change. The training objective is to optimize the actor's policy by optimizing the critic and maximizing the reward function. Reward function is determined as the distance from the robot's center of mass to the target points. Consequently, the reward received by the agent increases when the robot gets closer to the target point and becomes maximal when the target point is reached with an acceptable error.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have