Multi-Agent Reinforcement Learning With Measured Difference Reward for Multi-Association in Ultra-Dense mmWave Network

Xuebin Li,Allen B Mackenzie,Terry N Guo

doi:10.1109/access.2022.3221455

Abstract

Millimeter Wave (mmWave) communication technology is anticipated to play a vital role in meeting the growing demand for the scarce bandwidth in wireless communications. However, mmWave networks are highly susceptible to blockage. Thus, some mitigation techniques, such as multi-connectivity, need to be considered. Densely deploying mmWave base stations (mBSs) to form an ultra-dense network (UDN) also helps. With a mix of different technologies, optimally allocating resources becomes challenging. In this paper, we study mmWave user multi-association in a two-tier heterogeneous ultra-dense network (HetUDN) with a relatively large number of user equipments (UEs). We propose a framework of multi-agent reinforcement learning (MARL) to tackle the complicated optimization problem, leveraging its adaptivity to the communication environment. The proposed scheme considers mmWave beam-division based multi-connectivity and takes advantage of a macro base station (MBS) for indirect cooperation among agents (UEs). In particular, we borrow a credit-assignment technique called difference reward (DR) to deal with a relatively large MARL system with a large action space, which, to the best of our knowledge, is the first time to apply MARL with DR in user association. Furthermore, the proposed schemes are scalable mainly due to fixed observation dimensions and individual actions taken by UEs independently, ensuring that the operation is independent of the numbers of mBSs and UEs. Numerical results suggest that the two MARL schemes with measured DR could achieve a good balance between energy efficiency and QoS outage, and the one using extended DR (EDR) offers additional performance improvement.

Full Text