Abstract

In the traditional DQN (Deep Q network)-based signalized intersection control strategy, this research introduces an improved Distributional DQN to establish signal optimization decision-making model utilizing reinforcement learning based on value distribution. Different from the DQN model based on expected values, proposed model makes full use of the intersection environment information in each phase action for the distribution of the future total return. At the same time, proposed model constructs an optimization through minimizing the KL divergence between the estimated distribution and the true distribution, which makes it easier minimize the loss and accelerate the convergence speed of the model. In addition, a fixed boundary is reasonably added to the discrete distribution of the reward of the phase action, which effectively suppresses the traffic flow. The reward shock caused by large randomness reduces the instability of the algorithm. The simulation results demonstrate that the Distributional DQN proposed in this paper has a faster convergence rate than the original DQN, and the cumulative delay at the intersection is reduced by about 13.1%, and the average driving speed is increased by 7.1%, which further improves the control efficiency of signalized intersections.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call