Abstract
Adaptive traffic signal control (ATSC) can ease the increasing congestion to relieve pressure on metropolitan transportation systems. In a large-scale road network, ATSC has a high dimensional action space, which makes training very slow and algorithms difficult to converge for conventional centralized deep reinforcement learning (DRL) approaches. Multi-agent reinforcement learning (MARL) approach overcomes this issue by decomposing the joint action space to several sub-spaces and each agent searches the optimal action in its own space. However, if all the agents make their decisions independently and only maximize their own reward, the state transition probability of the environment in a Markov decision process (MDP) will come to be unstable and ATSC will not converge to the optimal policy finally. To let agents learn to cooperate, this paper proposes a novel MARL method where a difference reward overcomes the credit assignment issue among cooperated agents. Moreover, a spatially weighted reward, which can let agents consider the reward of their neighbors in the road network, is designed to evaluate the policies of decentralized actor networks such that the cooperation among agents is reinforced. By comparing against the independent DRL approach and other multi-agent approaches in a large-scale network, our proposed MARL approach is demonstrated that its effectiveness in terms of average reward and travel delay is over other approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.