Abstract

Multi-agent reinforcement learning is a promising solution to achieve intelligent traffic light control by regarding each intersection as an independent agent. However, agents encounter partial observability and environmental instability issues when learning optimal strategies. To mitigate the impacts caused by the partial observability of cooperative agents, we propose the auto-learning communication reinforcement learning (ALCORL) method based on the advantage actor–critic algorithm. ALCORL enables intersections to communicate and enhance cooperation by receiving messages from adjacent intersections in multi-intersection scenarios. Specifically, the autoencoder is introduced into ALCORL to dynamically learn communication messages instead of defining specific communication regulations. Different from most studies that control the sequential conversion of phases to improve traffic conditions, we focus on regulating the phase duration directly and scheduling the traffic light time more flexibly. We conduct extensive experiments on different-scale datasets and ever-changing traffic conditions to verify the validity of ALCORL. The experimental results show that ALCORL performs better than several state-of-the-art algorithms in all evaluation metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call