Abstract

Autonomous underwater vehicle (AUV) is widely used for complex underwater tasks such as seafloor exploration. In recent years, deep reinforcement learning (DRL) has been introduced to the AUV control due to its capability to improve the autonomy of AUV. However, it is usually very difficult to design an effective reward function for the DRL methods. Generative adversarial imitation learning (GAIL) can allow AUVs to learn control policies from expert demonstrations instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal expert demonstrations and not surpassing the provided demonstrations. This paper builds upon the GAIL algorithm for AUV learning control policies from expert demonstrations. We proposed an importance reweighting generative adversarial imitation learning (WGAIL) algorithm by using confidence scores to indicate the optimality of the demonstrated trajectories, which can facilitate AUVs to learn control policies from expert demonstrations of different levels. Our experimental results on a simulated AUV system modeling Sailfish 210 of our lab in the Gazebo simulation environment show that an AUV trained via WGAIL can achieve a better performance than the one trained via GAIL with different levels of expert sub-optimal demonstrations. Moreover, control policies trained via WGAIL in simple tasks can generalize better to complex tasks than those trained via GAIL, greatly extending the applicability of the AUV learning from expert demonstrations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call