Abstract

In this paper we consider a class of dynamic vehicle routing problems, in which a number of mobile agents in the plane must visit target points generated over time by a stochastic process. It is desired to design motion coordination strategies in order to minimize the expected time between the appearance of a target point and the time it is visited by one of the agents. We cast the problem as a spatial game in which each agent's objective is to maximize the expected value of the “time spent alone” at the next target location and show that the Nash equilibria of the game correspond to the desired agent configurations. We propose learning-based control strategies that, while making minimal or no assumptions on communications between agents as well as the underlying distribution, provide the same level of steady-state performance achieved by the best known decentralized strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call