Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

Zefang Zong,Yong Li,Xia Tong,Meng Zheng

doi:10.1145/3625232

Abstract

Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: (a) directly optimizing the goal with several practical constraints; (b) efficiently handling individual time-window limits; and (c) modeling the cooperation among the vehicle fleet. In this article, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time-window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to 11.7% compared to other RL baselines and could generate solutions for instances within seconds, while existing heuristic baselines take for minutes as well as maintain the quality of solutions. Moreover, our solution is thoroughly analyzed with meaningful implications due to the real-time response ability.

Full Text