Joint modulation-coding scheme (MCS) adaptation and resource block (RB) allocation is an effective approach to guarantee different quality of service (QoS) requirements of all UEs under dynamic network environments. In this paper, we consider a fifth generation (5 G) cellular network with time-varying wireless channels, in which the BS serves multiple user equipments (UEs) under limited available RBs. We aim to minimize the total RB consumption subject to the rigorous constraints of each UE's QoS requirement. To attain this objective, this paper puts forth an online learning technique, referred to as integrated Deep Reinforcement learning and stable Matching (DeepRM), in the sense that the MCS adaptation and RB allocation decisions are conducted without acquiring the real-time channel quality indicator (CQI) feedback. DeepRM is a closed-loop framework, where the output of deep reinforcement learning (DRL) is imported into the stable matching to guide optimal RB allocation whilst the output of stable matching is fed into the DRL framework to assist efficient MCS decision-making. Specifically, in DeepRM, we first develop a powerful DRL algorithm, termed as Action-and-Reward Branching Deep Q-network (ARBDQ), by incorporating the action branch architecture into conventional DRL and modifying the traditional deep neural network training mechanism, to perform judicious MCS decisions on different links in parallel. Then, a new many-to-one stable matching algorithm, called adaptive deferred acceptance, is exploited to dynamically adjust the RB quota of each UE in a computationally efficient fashion. Simulation results demonstrate that compared with ACO-HM, OLLA-ADA, and ARBDQ-Random algorithms, DeepRM induces much less RB consumption while guaranteeing the QoS requirements of all UEs in various network scenarios. Furthermore, under miscellaneous QoS requirement, number of UEs, and CQI reporting period setups, DeepRM is more robust than other baselines.
Read full abstract