Abstract

This paper investigates the reliable shortest path (RSP) planning problem from the reinforcement learning perspective. Different from canonical path planning methods, which require at least the first- order statistic (mean) and second-order statistic (variance) information of travel time distribution, we target at the RSP planning problem without the assumption of knowing any travel time distribution characteristic beforehand, and propose a cascaded temporal difference learning (CTD) method, which simultaneously estimates the mean and variance of the executing path and thereby gradually makes improvements through the generalized policy iteration (GPI) scheme, as the ego vehicle interacts with the environment. Extensive simulation results demonstrate the applicability of the proposed method for RSP learning in various transportation networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call