Shallow Reinforcement Learning for Energy Harvesting Communications With Imperfect Channel Knowledge

Heasung Kim,Jungwoo Lee,H Vincent Poor,Wonjae Shin

doi:10.1109/jstsp.2021.3091842

Abstract

This study aims to address the power allocation problem to maximize the sum of the generalized mutual information, which refers to the achievable rate with imperfect channel state information, through a reinforcement learning (RL) approach in energy harvesting communications. In contrast to the conventional deep RL applications, which incur a large computational load on the devices due to the use of deep neural networks, we adopt shallow RL architectures involving the optimal structural properties pertaining to the optimal power allocation policy. To design the shallow architectures that can fully capture the desired power allocation policy, we derive the partial monotonicity of and bounds on the policy and value functions. These structural properties represent mathematical bases on which to construct the shallow architecture. We use a deterministic policy gradient method with monotonically shape-constrained approximators that allow us to avoid using overly complicated deep neural networks, which are not suitable for low-power devices. Through various experiments, we visualize the solutions derived from the proposed shallow architectures and demonstrate that the proposed method outperforms existing power allocation policies and exhibits a greater robustness due to optimal structural properties.

Full Text