This paper studies the reliable shortest path (RSP) problem in stochastic transportation networks. State-of-the-art RSP solutions usually target one specific RSP problem; moreover, the corresponding algorithm’s computational complexity scales at least linearly with the size of the underlying transportation network. While in this paper, we propose a graph embedding and deep distributional reinforcement learning (GE-DDRL) method, which serves as a universal and scale-free solution to the RSP problem. GE-DDRL uses deep distributional reinforcement learning (DDRL) to estimate the full travel-time distribution of a given routing policy, and improves the given routing policy with the generalized policy iteration (GPI) scheme. Further, in order to achieve the generalization ability to new destination nodes, we employ one of the canonical graph embedding techniques (Skip-Gram) to compress the nodes’ representation into <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$d$</tex-math> </inline-formula> -dimensional real-valued vectors. With the properly compressed node features, GE-DDRL is able to generalize its estimation of the routing policy’s travel-time distribution to untrained destination nodes, and hence achieve the ‘all-to-all’ navigation functionality. To the best of our knowledge, GE-DDRL serves as the first RSP planner, which applies simultaneously to almost all RSP objectives and in the meanwhile, is scale free with the size of the transportation network in terms of the online decision-making time and memory complexity. Experimental results and comparisons with state of the arts show the efficacy and efficiency of GE-DDRL in a range of transportation networks.
Read full abstract