Reinforcement learning based monotonic policy for online resource allocation

Pankaj Mishra,Ahmed Moustafa

doi:10.1016/j.future.2021.09.023

Abstract

This research aims to design an optimal and strategyproof mechanism for online resource allocation problems. In such problems, consumers randomly arrive with their resource requests in an arbitrary manner. As a result, there is uncertainty in the future resource demands. In addition, the allocation and payment decisions depend on the providers’ past experiences. To address these challenges, we propose a novel reinforcement learning algorithm for optimising the resource allocation policy. The proposed algorithm adopts a novel monotonic reward shaping function that uses a dominant-resource multi-label classification technique. Finally, a critical payment value is calculated in order to maintain the strategyproofness in the online environment. The experimental evaluations show that the proposed mechanism achieves results that are within 96% of the optimal social welfare while outperforming the other mechanisms that use fixed pricing.

Full Text