Abstract

Dear Editor, This letter presents a novel method to tackle the two challenges of the centralized traffic control based on reinforcement learning (RL): the curse of dimensionality as the scale of traffic grid increases and the data-inefficiency problem that requires large amounts of samples to learn. First, we use a sequence-to-sequence (seq2seq) model and the attention mechanism to decompose the state-action space into sub-spaces, thus dealing with the first challenge. Second, we propose a new context-based meta-RL model that disentangles task inference and control, which improves the meta-training efficiency and accelerates the learning process in the new environment. We evaluate our approach on real-world datasets and the results demonstrate that our approach outperforms the state-of-the-art deep reinforcement learning (DRL)-based methods and the traditional control methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call