Abstract

In order to maximize energy efficiency in heterogeneous networks (HetNets), a turbo Q-Learning (TQL) combined with multistage decision process and tabular Q-Learning is proposed to optimize the resource configuration. For the large dimensions of action space, the problem of energy efficiency optimization is designed as a multistage decision process in this paper, according to the resource allocation of optimization objectives, the initial problem is divided into several subproblems which are solved by tabular Q-Learning, and the traditional exponential increasing size of action space is decomposed into linear increase. By iterating the solutions of subproblems, the initial problem is solved. The simple stability analysis of the algorithm is given in this paper. As to the large dimension of state space, we use a deep neural network (DNN) to classify states where the optimization policy of novel Q-Learning is set to label samples. Thus far, the dimensions of action and state space have been solved. The simulation results show that our approach is convergent, improves the convergence speed by 60% while maintaining almost the same energy efficiency and having the characteristics of system adjustment.

Highlights

  • With the dramatic growing number of wireless devices, more stringent requirements are put forward for performance and energy efficiency of heterogeneous networks (HetNets) [1]

  • (2) The turbo Q-Learning (TQL) is proposed by combining traditional Q-Learning and multistage decision process which has a loop iteration structure, each sub-Q-Learning solving each sub-problem which is from an original optimization problem

  • In order to jointly optimize resources to maximize the energy efficiency of HetNets by reinforcement learning (RL), there is a problem of too large action space and state space

Read more

Summary

Introduction

With the dramatic growing number of wireless devices, more stringent requirements are put forward for performance and energy efficiency of heterogeneous networks (HetNets) [1]. In this paper, inspired from the previous works [2,4,5,6,7,8,9,10], referring to RL and the idea of converting non-convex NP hard problem into several sub-problems, a turbo QL (TQL) scheme is proposed to optimize energy efficiency in which the traditional QL algorithm is decomposed into several sub-Q-Learning algorithms and has a loop iteration structure, each sub-Q-learning solving each sub-problem. (2) The TQL is proposed by combining traditional Q-Learning and multistage decision process which has a loop iteration structure, each sub-Q-Learning solving each sub-problem which is from an original optimization problem. It effectively deals with the dimensional explosion problem caused by the action space increasing in RL and greatly reduces the complexity of optimization problems.

Traditional Methods for Optimizing Hetnets
Machine Learning for Optimizing Hetnets
Neural Networks for Optimizing Hetnets
System Model
Problem Formulation
Solution with a Tql Algorithm
Q-Learning Algorithm
Sub-Problem A
Neural Network for the Classification of States
Numerical Simulation
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call