Transportation simulation is non-trivial due to the co-existence of thousands of heterogeneous decision makers (or vehicles). Such large-scale decision making is intrinsically a complex decentralized problem, the resolution of which is at the forefront of transportation simulation. Despite many physical or mathematical models proposed to date, underlying them is usually a set of universal rules plus some random perturbations to characterize vehicular movements. Inspired by the decision-making mechanism of rational human beings (i.e., learning iteratively from experience), this study proposes a novel two-stage ensemble reinforcement learning (TERL) paradigm for large-scale decentralized decision making in transportation simulation, in order to enhance the computational efficiency and thus scalability of RL to practical applications at scale. After establishing a problem-specific Markov decision process, the first stage utilizes clustering to group heterogeneous vehicles into quasi-homogeneous clusters. A representative RL model (or agent) is employed for each cluster where the included vehicles share and jointly optimize the policy parameters. The second stage develops an ensemble control strategy based on representative RL models for vehicles isolated as noise during clustering. While vehicles are simultaneously simulated, RL models are separately trained with cluster-specific experience replay. Application of TERL to the classical multi-user dynamic route choice problem in a real-world network of the Gusu District in Suzhou, China demonstrates the effectiveness of the proposed approach in deriving desirable simulation results, compared with the classical shortest path model and the dynamic user equilibrium model.
Read full abstract