Reinforcement learning is an effective method for adaptive traffic signal control in urban transportation networks. As the number of training rounds increases, the optimal control strategy is learned, and the learning capabilities of deep neural networks are further enhanced, thereby avoiding the limitations of traditional signal control methods. However, when faced with the sequential decision tasks of regional signal control, it encounters issues such as the curse of dimensionality and environmental non-stationarity. To address the limitations of traditional reinforcement learning algorithms applied to multiple intersections, the mean field theory is applied. This models the traffic signal control problem at multiple intersections within a region as interactions between individual intersections and the average effects of neighboring intersections. By decomposing the Q-function through bilateral estimation between the agent and its neighbors, this method reduces the complexity of interactions between agents while preserving global interactions between the agents. A traffic signal control model based on Mean Field Multi-Agent Reinforcement Learning (MFMARL) was constructed, containing two algorithms: Mean Field Q-Network Area Traffic Signal Control (MFQ-ATSC) and Mean Field Actor-Critic Network Area Traffic Signal Control (MFAC-ATSC). The model was validated using the SUMO simulation platform. The experimental results indicate that across different metrics, such as average speed, the mean field reinforcement learning method outperforms classical signal control methods and several existing approaches.
Read full abstract