Abstract

In this paper, we consider a power allocation optimization technique for a time-varying fading broadcast channel in energy harvesting communication systems, in which a transmitter with a rechargeable battery transmits messages to receivers using the harvested energy. We first prove that the optimal online power allocation policy for the sum-rate maximization of the transmitter is a monotonically increasing function of harvested energy, remaining battery, and the channel gain of each user. We then construct a lightweight neural network architecture to take advantage of the monotonicity of the optimal policy. This two-step approach, which relies on effective function approximation to provide a mathematical guideline for neural network design, can prevent us from wasting the representational capacity of neural networks. The tailored neural network architectures enable stable learning and eliminate the heuristic network design. For performance assessment, the proposed approach is compared with the closed-form optimal policy for a partially observable Markov problem. Through additional experiments, it is observed that our online solution achieves a performance close to the theoretical upper bound of the performance in a time-varying fading broadcast channel.

Highlights

  • In recent years, energy harvesting has become a promising method to solve the problem of the energy-limited systems

  • 2) WOLPERTINGER POLICY METHOD WITH TAILORED NEURAL NETWORK ARCHITECTURE (WPTN) This is the method we propose in this paper

  • In this paper, we proposed a novel two-step power allocation technique based on the policy gradient method to maximize the rate for the fading broadcast channel in energy harvesting communication systems

Read more

Summary

INTRODUCTION

Energy harvesting has become a promising method to solve the problem of the energy-limited systems. Reinforced learning-based approach to solve for the power allocation for sum-rate maximization in a time-varying fading channel with energy harvesting communication systems. We use an RL approach, especially an actor critic framework [16] by considering energy harvesting communication systems as a Markov decision process, to solve the power allocation problem in a time-varying fading broadcast channel. The optimal power allocation policy is proved to be an increasing function with respect to harvested energy, channel gains, and remaining battery. To the best of our knowledge, this is the first research which derives the optimal structural properties of the optimal policy and the optimal value function for the infinite-horizon discounted sum-rate maximization problem in a time-varying fading broadcast channel for energy harvesting communication systems. Our approach is applicable even under any independent and identically distributed random energy arrivals and channel gains

SYSTEM MODEL AND PROBLEM FORMULATION
MONOTONIC NETWORKS
PARTIALLY MONOTONIC NETWORKS
1: Build monotonic network with θ π and θ π
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call