This paper studies the access control problem of long-term throughput maximization in wireless communication systems with Energy Harvesting (EH). In the existing research, many access schemes based on accurate environmental information have been proposed, such as channel information and the EH process. However, access to environmental information is costly, and traditional access control frameworks are expensive to explore in high-dimensional spaces. Thus, an access control framework based on hierarchical reinforcement learning (HRL) is proposed in this paper. In HRL, the control problem in the Markov decision process (MDP) form is decomposed into a multilevel sequential control problem. It includes high-level channel number selection, mid-level channel selection, and low-level channel matching subproblems. The scheme is obtained by combining the solutions of subproblems at different level which are solved in sequence. In addition, to improve learning efficiency, the deterministic action (DA) module and the prior knowledge (PK) module are put forward. The DA module solves the channel matching problem under the additional guidance given by the previous subproblem, which selects definite good low-level actions. The PK module provides the framework with the common knowledge of the system structure learned from the hypothetical environment, so as to obtain better initial performance. Experimental results show that our framework achieves better performance and better learning efficiency compared with several recent transmission schemes. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —Access control is an important issue in wireless communication systems, and users need to be scheduled to solve the constraint of limited resources, such as energy usually provided by batteries. In recent years, in order to overcome the energy limitation, energy harvesting devices have been developed and applied to wireless communication systems. However, the energy collection ability of the system is greatly influenced by the environment, which leads to the poor performance of most traditional control schemes that rely on the prior knowledge of the environment. Therefore, this paper proposes a novel hierarchical reinforcement learning (HRL)-based model-free access control framework for wireless communication system to maximize the system throughput without any prior environmental knowledge. The scheme abstracts the original control problem into three sub-control sub control problems according to tasks and solves them sequentially, thus simplifying the original control problem. This scheme can not only learn independently, but also does not depend on the prior knowledge of the environment. Moreover, this method is also suitable for the large-scale environment while the conventional end-to-end reinforcement learning is not suitable for. Compared with traditional algorithms, our method has better performance and higher learning efficiency.
Read full abstract