Abstract

Model-based control approaches are inadequate to solve the marine surface vehicle (MSV) path-following problem, especially under adverse environments. To effectively deal with the MSV path-following problem, model-free deep reinforcement learning (DRL) based methods have been developed. However, defining an efficient reward function for DRL in path following tasks is rather difficult. Providing expert demonstration is often easier than designing effective reward functions. Thus, we propose a model-free stable adversarial inverse reinforcement learning (SAIRL) algorithm that only adopts the state of MSV and reconstructs the reward function from the expert demonstration. The SAIRL algorithm is designed to guarantee the prescribed MSV path following accuracy and training stability. It utilizes an alternative loss function and dual-discriminator framework to dissolve the issue of policy collapse, which arises due to the vanishing gradient of the discriminator. Simulations and experiments have validated that the SAIRL algorithm outperforms other baseline algorithms in terms of path-following accuracy and stability of convergence.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call