Abstract
Train trajectory optimization (TTO) is an effective way to address energy consumption in rail transit. Reinforcement learning (RL), an excellent optimization method, has been used to solve TTO problems. Although traditional RL algorithms use penalty functions to restrict the random exploration behavior of agents, they cannot fully guarantee the safety of the process and results. This paper proposes a proximal policy optimization based safety reinforcement learning framework (S-PPO) for the train trajectory optimization, including a safe action rechoosing mechanism (SARM) and a relaxed dynamic reward mechanism (RDRM) combining a relaxed sparse reward and a dynamic dense reward. SARM guarantees that the new states generated by the agent consistently adhere to the environmental security constraints, thereby enhancing sampling efficiency and facilitating algorithm convergence. RDRM makes it easier for agents to obtain successful samples by relaxing time constraints, which also offers a better balance between exploration and exploitation. The experimental results show that S-PPO can significantly improve performance and obtain better train operation trajectories than soft constraint methods, and the convergence process is smoother. Finally, it was demonstrated that S-PPO exhibits good adaptability across various speed limit tracks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.