Abstract

Using a multi-armed bandit technique, we propose centralized and semi-distributed online algorithms for load balancing user association and handover in mmWave-enabled networks. Load balancing at all base stations (BSs) imposes explicit constraints that makes the actions of all user equipment (UEs) co-dependent, a challenging twist to reinforcement learning. We propose a central load balancer to guarantee load balancing at all BSs for every learning step. We consider two association vectors: one for leaning update, and one best-to-date for data transmission, allowing UEs to engage in best-result data transmission while effectively participating in a background learning process indefinitely. For dynamic networks, we introduce a measurement model capturing rapid channel variations and user mobility. To minimize handover rate, we also differentiate between the handover cost for transmission and that for learning, and introduce a learning handover cost decreasing with sojourn time. The proposed algorithms can be implemented online as they require no offline training and can effectively adapt to network dynamics. Numerical results show that the proposed algorithms exhibit fast learning convergence and outperform 3GPP handover by achieving an order of magnitude lower handover rate at a significantly higher network sum-rate, reaching within 94-97% of the near-optimal worst connection swapping benchmark algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call