Abstract
This article considers a power allocation problem in energy harvesting downlink non-orthogonal multiple access (NOMA) systems in which a transmitter sends desired messages to their respective receivers by using harvested energy. To tackle this problem, we make use of a reinforcement learning approach based on a shallow neural network structure. We prove that the optimal power allocation policy and the optimal action-value function depend monotonically on some of their input variables and the shallow neural network structure is designed based on properties revealed in the proof. Different from inefficient deep learning methods that tend to require tremendous computational resources, this structure is capable of fully capturing the characteristics of the desired function with a single hidden layer. The optimized structure also allows learning agents to be robust and highly reliable in learning about randomly occurring data. Furthermore, we provide comprehensive experimental results in harsh environments where various arbitrary factors are assumed in order to demonstrate the robustness of the proposed learning approach compared with deep neural networks without proper grounds. It is also shown that the proposed learning process converges to a policy that outperforms existing power allocation algorithms.
Accepted Version
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have