Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Jie Zhong,Tao Wang,Lianglun Cheng

doi:10.1007/s40747-021-00366-1

Jie Zhong, Tao Wang + Show 1 more

Open Access

https://doi.org/10.1007/s40747-021-00366-1

Copy DOI

Abstract

In actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Highlights

Welding tasks exist in various industrial manufacturing processes
We give an overview of the background theories for the proposed Deep Reinforcement Learning (Deep-Reinforcement Learning (RL))-based collision-free path planner, including the kinematics modeling of a welding manipulator used in this research, the Sequential DecisionMaking model, and the model-free Deep-RL algorithm: Deep Deterministic Policy Gradient (DDPG)
It is known that the objective of the Deep-RL algorithm is to maximize the cumulative reward in one episode with finite time steps and, it is necessary to analyze the trend of the reward curve or the learning curve vs training episode number which reflects whether the target deterministic policy model has converged as well as the learning efficiency

Summary

Introduction

Welding tasks exist in various industrial manufacturing processes. Shipbuilding, known as a labor-intensive industry, requires a considerable number of skilled technicians to weld in enclosed and hazardous surroundings. The samplingbased method is one of the most popular path planning methods owing to its probabilistic completeness. We give an overview of the background theories for the proposed Deep-RL-based collision-free path planner, including the kinematics modeling of a welding manipulator used in this research, the Sequential DecisionMaking model, and the model-free Deep-RL algorithm: DDPG. Most of the motion specified by the task is defined in Cartesian space, it is inevitable to map the end-effector’s. Given the end-effector’s Cartesian velocity xe relative to itself, the corresponding velocity of joints’ angles qis as follows: q = J (q)† xe, (4).

Objectives

Methods

Results

Conclusion