Trajectory Planning of Manipulator Based on DQN Algorithm Guided by MPC Sampling

Donghao Qiao,Yao Zhao,Zhidan Zhong,Haobo Zhang

doi:10.1109/isrimt53730.2021.9597010

Abstract

In response to the problems of slow convergence and high randomness of the original DQN (Deep Q-Learning Network) algorithm in manipulator trajectory planning, an MPC (Model Predictive Control) algorithm advantage is fused with deep reinforcement learning, and a DQN algorithm of MPC-guided sampling is proposed. Firstly, the method reduces the number of failures during training by providing constraint control of the manipulator based on a dynamic model. Secondly, the MPC algorithm is run in different initial states. The trajectories are sampled and stored after iterative optimization by a linear Gaussian controller. These samples with high success rate improve the training speed of the neural network. Finally, a virtual simulation environment for the manipulator is built on the CoppeliaSim platform to validate the algorithm. The results show that the improved DQN algorithm improves the learning efficiency by nearly 1.5 times, whose effect is significantly better than that of the original DQN algorithm.

Full Text