Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

Qilei Zhang,Guangliang Li,Jinying Lin,Bo He,Qixin Sha

doi:10.1109/access.2020.2970433

Abstract

Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with deep interactive RL and can adapt to the actual environment by further learning from environmental rewards.

Highlights

In recent years, the role of autonomous underwater vehicle (AUV) in ocean exploration has become more and more important
Since the human trainer cannot provide human rewards for AUV and AUV needs to adapt to a changing environment when it is running in the ocean, we propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time
We test our methods in two tasks—straight line and sinusoids curve following of AUV by simulating in Gazebo

Summary

Introduction

The role of autonomous underwater vehicle (AUV) in ocean exploration has become more and more important. Equipped with a series of chemical and biological sensors, AUV can conduct continuous operation without human intervention in the ocean environment. It can work independently adjusting to the changes of marine environment to complete the ocean observation task. MDP mainly consists of five elements: agent, environment, state, action and reward. An agent interacts with the environment by acquiring the environment state, performing actions and obtaining rewards. The environment generates feedback reward rt+1 to the agent in the new state st+1. The agent will update the learned policy with the reward signal and perform a new action at+1 in the new state. The agent will optimizes the policy by continually interacting with the environment until an optimal policy is learned. The agent’s goal is to maximize the long-term cumulative rewards

Methods

Results

Conclusion