MU-MIMO technology is adopted in 5 G to support the increasing number of user terminals accessing the 5 G IoT systems. The algorithms adopted in the existing literatures for user scheduling in MIMO system are greedy algorithm essentially, which needs to repeatedly calculate the achievable data rate (or its low complexity characterization) of each user during the user selection. Due to the large number of IoT terminals, the existing methods will generate huge computational load. In this paper, we propose a multiuser scheduling algorithm for 5 G IoT systems based on reinforcement learning. The user terminal's action-value, which denotes the expectation of user terminal's achievable data rate, is obtained through Q-learning. We define the Q-value as the upper bound of the confidence interval of the user terminal's action-value and the proposed algorithm selects users on the basis of the Q-value. The proposed algorithm does not need to try different user combinations to maximize the throughput, and it is unnecessary to repeatedly calculate user's achievable data rate, so that the computational load is reduced. Simulation and numerical results show that the computational complexity of the proposed algorithm is lower than that of existing algorithms. At the same time, the system throughput achieved by this algorithm is not lower than that of greedy algorithms.
Read full abstract