Abstract

In this article, the joint relay selection and power allocation problem is studied to maximize the uplink cumulative performance for the time-varying energy harvesting-driven underwater acoustic sensor networks (EH-UASNs). We propose a stratification-based model-free deep reinforcement learning framework, which consists of deep deterministic policy gradient (DDPG) and deep Q network (DQN) algorithms, to solve the complex joint optimization problem. More specifically, the DQN is employed to optimize the discrete relay selection strategies; the DDPG is employed to optimize the continuous power allocation strategies. The stratification-based framework can intelligently track the complex state in a divide-and-conquer perspective; as a result, the proposed algorithm can explore larger solution space with high learning efficiency. Thereinto, we reconstruct the state by introducing available outdated channel information and the capacity of the battery for enriching effective learning information. Furthermore, to equilibrate the instantaneous demand and long-term quality of service (QoS), we propose a reward mechanism that can induce the agent to adaptively adjust the power allocation strategies to match the dynamic environment. Simulation results validate the high effectiveness of our algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call