Millimeter wave (mmWave) and terahertz MIMO systems rely on pre-defined beamforming codebooks for both initial access and data transmission. These pre-defined codebooks, however, are commonly not optimized for specific environments, user distributions, and/or possible hardware impairments. This leads to large codebook sizes with high beam training overhead which makes it hard for these systems to support highly mobile applications. To overcome these limitations, this paper develops a deep reinforcement learning framework that learns how to optimize the codebook beam patterns relying only on the receive power measurements. The developed model learns how to adapt the beam patterns based on the surrounding environment, user distribution, hardware impairments, and array geometry. Further, this approach does not require any knowledge about the channel, RF hardware, or user positions. To reduce the learning time, the proposed model designs a novel <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Wolpertinger</i> -variant architecture that is capable of efficiently searching the large discrete action space. The proposed learning framework respects the RF hardware constraints such as the constant-modulus and quantized phase shifter constraints. Simulation results confirm the ability of the developed framework to learn near-optimal beam patterns for line-of-sight (LOS), non-LOS (NLOS), mixed LOS/NLOS scenarios and for arrays with hardware impairments without requiring any channel knowledge.
Read full abstract