CTD: Cascaded Temporal Difference Learning for the Mean-Standard Deviation Shortest Path Problem

Hongliang Guo,Qihang Peng,Xuejie Hou

doi:10.1109/tits.2021.3096829

Hongliang Guo, Qihang Peng + Show 1 more

https://doi.org/10.1109/tits.2021.3096829

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper investigates the reliable shortest path (RSP) planning problem from the reinforcement learning perspective. Different from canonical path planning methods, which require at least the first- order statistic (mean) and second-order statistic (variance) information of travel time distribution, we target at the RSP planning problem without the assumption of knowing any travel time distribution characteristic beforehand, and propose a cascaded temporal difference learning (CTD) method, which simultaneously estimates the mean and variance of the executing path and thereby gradually makes improvements through the generalized policy iteration (GPI) scheme, as the ego vehicle interacts with the environment. Extensive simulation results demonstrate the applicability of the proposed method for RSP learning in various transportation networks.

Full Text