Abstract
Multi-agent reinforcement learning has shown great potential for coordinating multi-intersection traffic signals due to its powerful adaptive capabilities, treating each intersection as an agent. However, in the real world, different intersections possess differentiating characteristics such as unique vehicle distributions and traffic patterns. Most existing methods directly add neighboring intersection states to local intersections and optimize the cooperative policy network based on synthesized global features. This indirect optimization approach makes it difficult to thoroughly explore the mutual interactions among different intersection agents, preventing agents from truly learning features with cooperative awareness. To resolve these challenges, we introduce contrastive learning as representation task to the multi-intersection traffic signal control approach named CLlight for two-stage policy network updating. In the first stage, we utilize policy-based or actor-critic-based reinforcement learning methods such as A2C, SAC, and PPO to train policy networks with certain representational capabilities. In the second stage, by extracting pre- and post-masked features and reconstructing the post-masked features, the agents are encouraged to learn the similarities and differences between different intersection policies, which in turn enhances the cooperative and individual representation capabilities of the policy network. To the best of our knowledge, this is the first application of contrastive learning in the field of traffic signal control. Experimental results demonstrate, compared to other state-of-the-art traffic signal control methods, superior average travel time and average waiting time performance under various scenarios, tested on synthetic and real-world datasets.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have