Autonomous manipulation operations represent the high intelligent coordination from robotic vision and control, it is also a symbol of the advances of robotic intelligence. The limitations of visual sensing and the increasingly complex experimental conditions make autonomous manipulation operations more difficult, particularly for deep reinforcement learning methods, which can enhance robotic control intelligence but require a lot of training process. Due to the high-dimensional continuous state space and continuous action space characteristics of underwater operations, this paper adopts a policy-based reinforcement learning method as the foundational approach. To address the issues of instability and low convergence efficiency in traditional policy-based reinforcement learning algorithms during the learning process, this paper proposes a novel policy learning method. This method adopts the Proximal Policy Optimization algorithm (PPOClip) and optimizes it through an actor-critic network. The aim is to improve the stability and effectiveness of convergence in the learning process. In the underwater training environment, a new reward shaping scheme has been designed to address the issue of reward sparsity during the training process. The manually crafted dense reward function is utilized as attractive and repulsive potential functions for goal manipulation and obstacle avoidance. On the highly complex underwater manipulation and training environment, transferred learning algorithm has been established to reduce the training times and compensate the differences between the simulation and experiment. Simulations and tank experiments have verified the performance of the proposed strategy learning method.
Read full abstract