Abstract

Reinforcement learning has potential in the area of intelligent transportation due to its generality and real-time feature. The Q-learning algorithm, which is an early proposed algorithm, has its own merits to solve the train timetable rescheduling (TTR) problem. However, it has shortage in two aspects: Dimensional limits of action and a slow convergence rate. In this paper, a deep deterministic policy gradient (DDPG) algorithm is applied to solve the energy-aimed train timetable rescheduling (ETTR) problem. This algorithm belongs to reinforcement learning, which fulfills real-time requirements of the ETTR problem, and has adaptability on random disturbances. Superior to the Q-learning, DDPG has a continuous state space and action space. After enough training, the learning agent based on DDPG takes proper action by adjusting the cruising speed and the dwelling time continuously for each train in a metro network when random disturbances happen. Although training needs an iteration for thousands of episodes, the policy decision during each testing episode takes a very short time. Models for the metro network, based on a real case of the Shanghai Metro Line 1, are established as a training and testing environment. To validate the energy-saving effect and the real-time feature of the proposed algorithm, four experiments are designed and conducted. Compared with the no action strategy, results show that the proposed algorithm has real-time performance, and saves a significant percentage of energy under random disturbances.

Highlights

  • Nowadays, artificial intelligence (AI) has successfully been used for understanding human speech [1,2], competing at a high level in strategic game systems, self-driving vehicles [6,7], and interpreting complex data [8,9]

  • Reinforcement learning (RL) [10,11], which is a vital branch of AI, has potential in the area of intelligent transportation

  • There are two advantages of RL: First, due to its generality, agents can effectively study many disciplines in a complex environment such as the metro network [12,13,14]; second, an agent with full exploration of the environment can give proper decisions in real-time, which means that RL can be used in optimization problems with real-time requirements

Read more

Summary

Introduction

Artificial intelligence (AI) has successfully been used for understanding human speech [1,2], competing at a high level in strategic game systems (such as Chess [3] and Go [4,5]), self-driving vehicles [6,7], and interpreting complex data [8,9]. Algorithm based on RL to calculate optimal decisions, which minimize the reward for both total time-delay and energy-consumption. Both literatures are based on the Q-learning algorithm [21,22] belonging to RL. DDPG is applied to solve the ETTR problem This algorithm is a model-free, off-policy actor-critic algorithm using deep function approximators that can learn policies in high-dimensional, continuous action spaces [23]. It is successfully applied in fields such as robotic control [27] and traffic light timing optimization [28].

Principles of Deep Deterministic Policy Gradient
Model of Train Traffic
Model of Energy Consumption
Model of Train Movement
Relation
Environment and Agent
Action
Rewards
Experimental
10. Energy
11. Rewards
Conclusions
16. Energy
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.