Abstract

This work investigates the performance of an energy harvesting communications system. This system consists of a transmitter and a receiver. The transmitter is equipped with an infinite buffer to store data, and energy harvesting capability to harvest renewable energy and store it in a finite battery. The goal is to maximize the expected cumulative throughput of such systems. The problem of finding an optimal power allocation policy is formulated as a Markov decision process. Two cases are considered based on the availability of statistical knowledge about the channel gain and energy harvesting processes. When this knowledge is available, an algorithm is designed to maximize the expected throughput, while reducing the complexity of traditional methods (e.g., value iteration). This algorithm exploits instant knowledge about the channel, harvested energy, and current battery level to find a near-optimal policy. For the second scenario, when the statistical knowledge is unavailable, reinforcement learning is used. Two different exploration algorithms, convergence-based and the epsilon-greedy algorithms, are used. Simulations and comparisons with conventional algorithms show the effectiveness of the look-ahead algorithm when the statistical knowledge is available, and the effectiveness of reinforcement learning in optimizing the system performance when this knowledge is unavailable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call