Off-Policy Learning in Contextual Bandits for Remote Electrical Tilt Optimization

Filippo Vannella,Jaeseong Jeong,Alexandre Proutiere

doi:10.1109/tvt.2022.3202041

Abstract

We investigate the problem of Remote Electrical Tilt (RET) optimization using off-policy learning techniques devised for Contextual Bandits (CBs). The goal in RET optimization is to control the vertical tilt angle of antennas at base stations to optimize key performance indicators representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy contextual bandit framework. We propose CB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.

Full Text