5G wireless networks are expected to satisfy different delay requirements of various traffics by network resource scheduling. Existing scheduling methods perform poorly in practice due to their unrealistic assumption on the access to the full channel state information (CSI) or the explicit mathematical expression of network delay. In this paper, we consider the delay-oriented packet scheduling problem in multi-cell 5G downlink networks with multiple users and traffic types (e.g., FTP, VoIP and video streaming), and formulate it as a partially observable Markov decision process (POMDP). We design a delay-oriented downlink scheduling framework based on deep reinforcement learning (DRL) to autonomously schedule the active traffic flows without the full channel information. Furthermore, a recurrent proximal policy optimization (RPPO) algorithm is proposed to perceive the underlying state and accelerate learning under different time granularities, with the policy gradient theorem under POMDP strictly proved. By incorporating the future traffic information provided by a proposed spatial-temporal prediction algorithm, RPPO can balance the load and achieve lower delay in real-time multi-cell multi-user scenarios. Results of extensive experiments on a realistic 5G simulator demonstrate that our framework significantly outperforms existing approaches in terms of both tail delay and average delay for up to 48% and 41.7%, respectively.
Read full abstract