Abstract

This paper proposed a deep deterministic policy gradient (DDPG) algorithm based on the clipped double Q-learning method and long short-term memory (LSTM) neural network for the trajectory tracking control of wheeled mobile robots (WMR). Firstly, aiming at solving the problem of overestimating state-action value in the DDPG algorithm, a double critic network is introduced to approximate the value function, and the minimum value will be taken to calculate the target value of actor network update, avoiding the algorithm falling into local optimal. Then, to overcome the difficulty caused by partial observability in reinforcement learning control, the historical states information will be used to generate the current action by an actor, which adopts an LSTM neural network processing the temporal relations in those historical states. Finally, the reinforcement learning agent based on the improved LSTM-DDPG algorithm is designed as the kinematic controller of WMR and provides reference values for the dynamic controller (two independent PI controllers). Simulation results verify the effectiveness of this control scheme based on the proposed algorithm for WMR trajectory tracking control, even in the presence of external disturbances and model parameter uncertainties.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call