We devise a partially observable Markov decision process (POMDP) based joint antenna selection and user scheduling (JASUS) policy for a massive MIMO base station, equipped with only a small number of RF chains, that serves a large number of users. The users are served at different time slots within a frame. Relying on partial CSI obtained from training between the selected antennas and the users, at the beginning of each frame, the BS assigns each user to a time slot in the frame and selects a subset of antennas to serve the users scheduled in each time slot. Assuming that the channels evolve according to a Markov process and relying on zero-forcing beamforming, we formulate our JASUS problem using a POMDP framework to devise a real-time decision-making policy that maximizes the expected long-term sum-rate. We rigorously prove that for positively correlated two-state channel models, the myopic policy provides the optimal solution to our POMDP-based JASUS problem for any number of RF chains and for any number of users. Based on this, we model the Rayleigh fading channels as first-order Gauss-Markov processes and devise a low-complexity myopic policy-based JASUS algorithm for massive MU-MIMO systems that only relies on partial CSI.
Read full abstract