Abstract

The single train trajectory optimization, also known as speed profile optimization (SPO), is a traditional problem to minimize the traction energy consumption of trains. As a kind of optimal method, reinforcement learning (RL) has been used to solve the SPO problem. In the learning process of a common RL algorithm, a soft constraint (punishment) is always used to keep the agent away from unsafe states. However, a soft constraint can not guarantee and explain the safety of the result. For the SPO problem, it means that the optimized speed profile obtained by a simple RL may break the speed limit which is unacceptable in reality. This paper proposes a protection mechanism called Shield and constructs a Shield SARSA ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${S}$ </tex-math></inline-formula> -SARSA) algorithm to protect the learning process of the high-speed train. Four different reward functions are used to compare the protective efficacy between the proposed algorithm and the soft constraint. The numerical experiments based on the line data from Wuxi East to Suzhou North verify the protective efficacy and effectiveness.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.