Abstract

This paper proposes an improved TD3 (Twin Delayed Deep Deterministic Policy Gradient) algorithm to address the flaws of low success rate and slow training speed, when using the original TD3 algorithm in mobile robot path planning in dynamic environment. Firstly, prioritized experience replay and transfer learning are introduced to enhance the learning efficiency, where the probability of beneficial experiences being sampled in the experience pool is increased, and the pre-trained model is applied in an obstacle-free environment as the initial model for training in a dynamic environment. Secondly, dynamic delay update strategy is devised and OU noise is added to improve the success rate of path planning, where the probability of missing high-quality value estimate is reduced through changing the delay update interval dynamically, and the correlated exploration of the mobile robot inertial navigation system in the dynamic environment is temporally improved. The algorithm is tested by simulation where the Turtlebot3 robot model as a training object, the ROS melodic operating system and Gazebo simulation software as an experimental environment. Meanwhile, the result shows that the improved TD3 algorithm has a 16.6 % increase in success rate and a 23.5 % reduction in algorithm training time. A generalization experiment was designed finally, and it indicates that superior generation performance has been acquired in mobile robot path planning with continuous action spaces through the improved TD3 algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call