A longitudinal platoon control method based on Twin Delayed Deep Deterministic Policy Gradient (TD3) and Model Predictive Control (MPC) is proposed to solve the problems of low following efficiency and system instability in longitudinal platoon control. Firstly, Dynamic Bayesian Network (DBN) and Long Short-Term Memory (LSTM) network are introduced to identify the driving behavior of bystanders and derive the MPC objective constraint function containing three indicators of following, comfort and fuel consumption according to the platoon dynamics equation. Secondly, the system prediction model and cost function are introduced into the action-critic network of TD3 to solve the problem of no model training in the traditional TD3 algorithm and to speed up the training speed and accuracy of the network. On this basis, a Bellman equation is proposed to calculate the time-domain difference error and the expected loss function to solve the TD3 network overestimation problem. Finally, the joint simulation platform is built to simulate the platoon driving conditions and compare with the DDPG optimization algorithm and the traditional MPC algorithm respectively. The results show that the improved TD3-MPC algorithm satisfies the constraints and the spacing error is controlled within 0.3 m, and the vehicle speed changes more smoothly in the scenario of speed fluctuation in the front vehicle, and the ride comfort of the platoon is improved, and it has better robustness in the scenario of vehicle cut-in. The experimental results show that when the vehicle speeds are 20, 40, and 60 km/h, compared to MPC, the average spacing errors are reduced by 27.56%, 25.64%, and 28.04%, respectively, and compared to DDPG-MPC, the average spacing errors are reduced by 6.59%, 8.77%, and 7.86%, respectively.
Read full abstract