Abstract

This paper studies a dynamic vehicle routing problem under stochastic demands, drawn from a real-world situation. Specifically, a single courier must accomplish two kinds of tasks: deliveries known at the beginning of the operation and pickups that appear throughout the daily operation with specific patterns. The objective is to maximise the rewards obtained from serving both types of customers during a limited period. Our contribution lies in using the neural network and historical couriers’ decisions to learn a base policy that captures human experience for better decision making. The reinforcement learning framework is then used to make the base policy explore new scenarios through simulations and further train the base policy with newly generated data. We show that our approach allows the serving of an average of 12% and 8% more customers under some conditions than the nearest-neighbour policy in high density area and low density area, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call