Rate Maximization with Reinforcement Learning for Time-Varying Energy Harvesting Broadcast Channels

Heasung Kim,Heecheol Yang,Jungwoo Lee,Nayoung Lee,Wonjae Shin

doi:10.1109/globecom38437.2019.9013583

Abstract

In this paper, we consider a power allocation optimization technique for a time-varying fading broadcast channel in energy harvesting communication systems, in which a transmitter with a rechargeable battery transmits messages to receivers using the harvested energy. We first prove that the optimal online power allocation policy for the sum rate maximization of the transmitter is an increasing function of harvested energy, remaining battery, and each user's channel gain. We then construct an appropriate neural network by relying on increasing behavior of the optimal policy. This two-step approach, by using an effective function approximation as well as providing a fundamental guideline for neural network design, can prevent us from wasting the representational capacity of neural networks. On the basis of the neural network, we apply the policy gradient method to solve the power allocation problem. To validate the performance of our approach, we compare it with the closed-form the optimal policy in a partially observable Markov problem. Through further experiments, it is observed that our online solution achieves a performance close to the theoretical upper bound of the performance in a time-varying fading broadcast channel.

Full Text