This paper addresses the challenges of deploying reinforcement learning (RL) models for traffic signal control (TSC) in real-world environments. Real-world training can prevent mismatches between simulation environments and the actual traffic conditions, thereby achieving better performance of agent upon deployment. However, free explorations by agents during real-world training can disrupt traffic operations. To mitigate this, this paper proposes a reference mechanism to guide the decision-making process within the RL framework. A reference timing policy, typically a model-based signal strategy, is integrated into the learning process to provide agents with reference actions. Specifically, an additional Q-value function is introduced to evaluate both the agent’s actions and those of the reference policy, allowing for adjustments before the actions are executed in real traffic system. Numerical results indicate that the reference mechanism effectively enhances system performance early in the training process, thus accelerating learning. We also combine the reference RL method with a pretraining procedure and a jump-start algorithm, respectively. Experimental results demonstrate their effectiveness in further enhancing system performance and facilitating real-world training.
Read full abstract