Abstract

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Highlights

  • Traffic congestion occurs when the traffic demand for a road network approaches or exceeds its available road capacity

  • This segment is between junction 21A (J21A) and junction 25 (J25) with an approximate length of 12.4 km

  • A Dyna-Q based multiagent reinforcement learning method referred to as Dyna-MARL for motorway ramp control has been developed in this paper

Read more

Summary

Introduction

Traffic congestion occurs when the traffic demand for a road network approaches or exceeds its available road capacity. Mathematical Problems in Engineering the outflow of motorway mainline approach a predetermined target value which is usually the road capacity Another group of systems focuses on formulating different control scenarios as optimisation problems and using optimal control techniques (e.g., model predictive control) to solve them. Q-learning can only learn from real interactions with the traffic operation and cannot make full use of historical data (or models) Because of this limitation, Q-learning usually has a low learning speed and needs a great number of trials to obtain the best control strategy in some complex scenarios, such as incidentinduced congestion [27]. One solution to speed up the learning process and deal with incidents efficiently has been proposed in our previous work [27, 29] This system used the Dyna-Q architecture to combine model-free Qlearning with a model-based method and can be used to accomplish single-agent tasks.

Reinforcement Learning
Dyna-Q Based Indirect Coordination Strategy
Modified Asymmetric Cell Transmission Model
Definition of RL Elements
Case Study and Results
Section 1
Section 3 Section 2 Section 1
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call