Abstract

Improving energy efficiency has shown increasing importance in designing future cellular system. In this work, we consider the issue of energy efficiency in D2D-enabled heterogeneous cellular networks. Specifically, communication mode selection and resource allocation are jointly considered with the aim to maximize the energy efficiency in the long term. And an Markov decision process (MDP) problem is formulated, where each user can switch between traditional cellular mode and D2D mode dynamically. We employ deep deterministic policy gradient (DDPG), a model-free deep reinforcement learning algorithm, to solve the MDP problem in continuous state and action space. The architecture of proposed method consists of one actor network and one critic network. The actor network uses deterministic policy gradient scheme to generate deterministic actions for agent directly, and the critic network employs value function based Q networks to evaluate the performance of the actor network. Simulation results show the convergence property of proposed algorithm and the effectiveness in improving the energy efficiency in a D2D-enabled heterogeneous network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call