Abstract
With the increasing demand for ocean exploration, higher requirements on both autonomy and intelligence have been put forward on the development of Autonomous Underwater Vehicle (AUV). To this end, deep reinforcement learning methods have started being used to improve AUV's autonomy and intelligence in recent years. However, low learning efficiency and high learning cost of traditional deep reinforcement learning prevent from applying them to physical AUV systems in real underwater environments. Therefore, this paper proposed a deep interactive reinforcement learning method based on the Deep Deterministic Policy Gradient (DDPG) algorithm for continuous motion control of AUV path following. The highlight of our proposed method is the design of a new reward allocator. Specifically, different from current deep interactive reinforcement learning methods, we allow the human trainer to provide a preferred action based on the evaluation on AUV's current situation. Then, the reward allocator is used to assign rewards indirectly based on the preferred action to deal with the high frequency of continuous action changes of AUV. The proposed method was tested in a sinusoids curve following tasks in the Gazebo simulation platform with an AUV simulator of our lab. The experimental results and analysis show that AUV path following with our proposed method can learn a more stable policy about 100 episodes faster than learning from only environmental rewards or only human rewards.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.