Abstract
In order to maximize energy efficiency in heterogeneous networks (HetNets), a turbo Q-Learning (TQL) combined with multistage decision process and tabular Q-Learning is proposed to optimize the resource configuration. For the large dimensions of action space, the problem of energy efficiency optimization is designed as a multistage decision process in this paper, according to the resource allocation of optimization objectives, the initial problem is divided into several subproblems which are solved by tabular Q-Learning, and the traditional exponential increasing size of action space is decomposed into linear increase. By iterating the solutions of subproblems, the initial problem is solved. The simple stability analysis of the algorithm is given in this paper. As to the large dimension of state space, we use a deep neural network (DNN) to classify states where the optimization policy of novel Q-Learning is set to label samples. Thus far, the dimensions of action and state space have been solved. The simulation results show that our approach is convergent, improves the convergence speed by 60% while maintaining almost the same energy efficiency and having the characteristics of system adjustment.
Highlights
With the dramatic growing number of wireless devices, more stringent requirements are put forward for performance and energy efficiency of heterogeneous networks (HetNets) [1]
(2) The turbo Q-Learning (TQL) is proposed by combining traditional Q-Learning and multistage decision process which has a loop iteration structure, each sub-Q-Learning solving each sub-problem which is from an original optimization problem
In order to jointly optimize resources to maximize the energy efficiency of HetNets by reinforcement learning (RL), there is a problem of too large action space and state space
Summary
With the dramatic growing number of wireless devices, more stringent requirements are put forward for performance and energy efficiency of heterogeneous networks (HetNets) [1]. In this paper, inspired from the previous works [2,4,5,6,7,8,9,10], referring to RL and the idea of converting non-convex NP hard problem into several sub-problems, a turbo QL (TQL) scheme is proposed to optimize energy efficiency in which the traditional QL algorithm is decomposed into several sub-Q-Learning algorithms and has a loop iteration structure, each sub-Q-learning solving each sub-problem. (2) The TQL is proposed by combining traditional Q-Learning and multistage decision process which has a loop iteration structure, each sub-Q-Learning solving each sub-problem which is from an original optimization problem. It effectively deals with the dimensional explosion problem caused by the action space increasing in RL and greatly reduces the complexity of optimization problems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have