The problem of access point (AP) to device association in a cell-free massive multiple-input multiple-output (MIMO) system is investigated. Utilizing energy efficiency (EE) as our main metric, we determine the optimal association parameters subject to minimum rate constraints for all devices. We incorporate all existing practical concerns in our formulation, including training errors, pilot contamination, and central processing unit access to only statistical channel state information (CSI). This EE maximization problem is highly non-convex and possibly NP-hard. We propose to solve this challenging problem by model-free deep reinforcement learning (DRL) methods. Due to the very large discrete action space of our posed optimization problem, existing DRL approaches can not be directly applied. Thus, we approximate the large discrete action space with either a continuous set or a smaller discrete set, and modify existing DRL methods accordingly. Our novel approximations offer a framework with tolerable complexity and satisfactory performance that can be readily applied to other challenging optimization problems in wireless communication. Simulation results corroborate the superior performance of the modified DRL methods over conventional approaches.
Read full abstract