Research on Dynamic Path Planning of Mobile Robot Based on Improved DDPG Algorithm

Peng Li,Shiquan Zhao,Xiangcheng Ding,Ricardo Cajo,Hongfang Sun,Xingsi Xue

doi:10.1155/2021/5169460

Peng Li, Shiquan Zhao + Show 4 more

Open Access

https://doi.org/10.1155/2021/5169460

Copy DOI

Abstract

Aiming at the problems of low success rate and slow learning speed of the DDPG algorithm in path planning of a mobile robot in a dynamic environment, an improved DDPG algorithm is designed. In this article, the RAdam algorithm is used to replace the neural network optimizer in DDPG, combined with the curiosity algorithm to improve the success rate and convergence speed. Based on the improved algorithm, priority experience replay is added, and transfer learning is introduced to improve the training effect. Through the ROS robot operating system and Gazebo simulation software, a dynamic simulation environment is established, and the improved DDPG algorithm and DDPG algorithm are compared. For the dynamic path planning task of the mobile robot, the simulation results show that the convergence speed of the improved DDPG algorithm is increased by 21%, and the success rate is increased to 90% compared with the original DDPG algorithm. It has a good effect on dynamic path planning for mobile robots with continuous action space.

Highlights

Path planning is a very important part of the autonomous navigation of robots. e robot path planning problem can be described as finding an optimal path from the current point to the specified target point in the robot working environment according to one or more optimization objectives under the condition that the robot’s position is known [1, 2]
A new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed in which the RAdam algorithm is used to replace the neural network algorithm in the original algorithm combined with the curiosity algorithm to improve the success rate and convergence speed and introduce priority experience replay and transfer learning. e original data is obtained through the lidar carried by the mobile robot, the dynamic obstacle information is obtained, and the improved algorithm is applied to the path planning of the mobile robot in the dynamic environment so that it can move safely from the starting point to the end point in a short time, get the shortest path, and verify the effectiveness of the improved algorithm
The RAdam algorithm [19] is an algorithm proposed in recent years, which has the characteristics of fast convergence and high precision, and the RAdam algorithm can effectively solve the differences in adaptive learning methods. erefore, the RAdam algorithm is introduced into the DDPG algorithm to solve the problems of low success rate and slow convergence speed of mobile robot path planning in the dynamic environment caused by neural network variance problem [20]. e RAdam algorithm formula can be expressed as follows:

Summary

Introduction

Path planning is a very important part of the autonomous navigation of robots. e robot path planning problem can be described as finding an optimal path from the current point to the specified target point in the robot working environment according to one or more optimization objectives under the condition that the robot’s position is known [1, 2]. In 2019, in order to solve the long-distance path planning problem of outdoor robots, Huang [8] proposed an improved D ∗ algorithm, which is combined with Gaode mapping based on a vector model. Erefore, the DQN algorithm [13] comes into being It usually solves the problem of discrete and low-dimensional action space. When the DDPG algorithm is applied to path planning in a dynamic environment, it has some shortcomings, such as low success rate and slow convergence speed, and most of the related research stays at the theoretical level, lacking solutions to practical problems. A new DDPG algorithm is proposed in which the RAdam algorithm is used to replace the neural network algorithm in the original algorithm combined with the curiosity algorithm to improve the success rate and convergence speed and introduce priority experience replay and transfer learning. E organizational structure of this article is as follows: the first section is an introduction, the second section introduces the DDPG algorithm principle and network parameter setting, the third section is path planning design of improved DDPG algorithm, the fourth section shows the simulation experiment and analyzed results, and the summaries are given in the last section

DDPG Algorithm Principle and Network Parameter Setting

DDPG Network Parameter Setting

Path Planning Design of Improved DDPG Algorithm

Simulation Experiment and Result Analysis

Findings

Conclusion