Abstract

Data collection is a fundamental operation in energy harvesting industrial Internet of Things networks. To this end, we consider a hybrid access point (HAP) or controller that is responsible for charging and collecting <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$L$</tex-math></inline-formula> bits from sensor devices. The problem at hand is to optimize the transmit power allocation of the HAP over multiple time frames. The main challenge is that the HAP has causal channel state information to devices. In this article, we outline a novel two-step reinforcement learning with Gibbs sampling (TSRL-Gibbs) strategy, where the first step uses Q-learning and an action space comprising transmit power allocation sampled from a multidimensional simplex. The second step applies Gibbs sampling to further refine the action space. Our results show that TSRL-Gibbs requires up to 28.5% fewer frames than competing approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call