The paper studies the resource allocation problem for delivering on-site services in urban areas. Requests for services are received spontaneously, with deliveries to be assigned dynamically. Real-life examples of such applications include the dispatch of traffic officers to scenes of accidents and the deployment of mechanics to sites of maintenance works. The dynamic assignment problem is to be solved via a policy gradient approach that dynamically assigns workers to different locations so that each customer involved would experience a minimum delay. Our solution framework adopts the transformer architecture with layers of inter-task and inter-agent communications as the approximator. This approximator is trained with the vanilla policy gradient algorithm. To improve computational effectiveness, we introduce an option of withholding an assignment, where workers may not be assigned at a decision point even if a service request is received, to enhance the flexibility of actions. Extensive computational experiments with a varying number of orders, order frequencies, and spatial sparsity are conducted. Our proposed method is shown to outperform other benchmarking methods, including the genetic algorithm and other online heuristics, in terms of stability of effectiveness, computational efficiency, and solution quality. Our experimental results suggest that the proposed method would have a reduced advantage over other benchmarking algorithms if the on-site service time is long.
Read full abstract