Abstract

For a serving system with multiple servers and a public queue, we study the scheduling of multiple tasks with deadlines, under random task arrivals and renewable energy generation. To minimize the weighted sum of the serving cost (associated with the energy consumption) and the delay cost (resulting from deferring the processing of tasks after their deadlines), we formulate the problem as a dynamic program with unknown transition probability. To mitigate the curse of dimensionality, we establish a partial priority rule, the ED-LDF: priority should be given to tasks with earlier deadline and less demand. In the heavy-traffic regime, the established ED-LDF characterization is proved to be optimal under arbitrary system dynamics. We propose a new, scalable ED-LDF based proximal policy optimization (PPO) approach that integrates our (partial) optimal policy characterizations into the state-of-art deep reinforcement learning algorithm. Numerical results demonstrate that the proposed ED-LDF based PPO approach outperforms the classical PPO and three other priority rule based PPO approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call