Tracking control for autonomous underwater vehicles (AUVs) faces multifaceted challenges, making the acquisition of optimal demonstrations a daunting task. The suboptimal demonstrations mean less tracking accuracy. To address the issue of learning from suboptimal demonstrations, this paper proposes a model-free reinforcement learning (RL) method. Our approach utilizes suboptimal demonstrations to obtain an initial controller, which is iteratively refined during training. Given the suboptimal characteristics, demonstrations will be removed from the replay buffer upon reaching capacity. Building upon the soft actor-critic (SAC), our approach integrates a Recurrent Neural Network (RNN) into the policy network to capture the relationship between states and actions. Moreover, we introduce logarithmic and cosine functions to the reward function for enhancing the training effectiveness. Finally, we validate the effectiveness of the proposed Initialize Controller from Demonstrations (ICfD) algorithm through simulations with two reference trajectories. We provide a definition for tracking success. The success rates of ICfD in the two reference trajectories are 95.60% and 94.05%, respectively, surpassing the state-of-the-art RL method SACfD (80.03% 90.55%). The average one-step distance errors of ICfD are 1.20 m and 0.76 m, respectively, significantly lower than the S-plane controller (9.725 m 8.325 m). Besides, we evaluate the generalization of the ICfD controller in different scenarios.
Read full abstract