Abstract

In this paper, we employ multiple wireless-powered relays to assist information transmission from a multi-antenna access point to a single-antenna receiver. The wireless relays can operate in either the passive mode via backscatter communications or the active mode via RF communications, depending on their channel conditions and energy states. We aim to maximize the overall throughput by jointly optimizing the transmit beamforming and the relays’ radio modes and operating parameters. Due to the non-convex and combinatorial problem structure, we develop a novel optimization-driven hierarchical deep deterministic policy gradient (H-DDPG) approach to adapt the beamforming and relay strategies. The optimization-driven H-DDPG algorithm firstly decomposes the binary relay mode selection into the outer-loop deep <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -network (DQN) algorithm and then optimizes the continuous beamforming and relaying strategies by using the inner-loop DDPG algorithm. Secondly, to improve the learning efficiency, we integrate the model-based optimization into the inner-loop DDPG framework by providing a better-informed target estimation for DNN training. Simulation results reveal that these two special designs ensure a more stable learning performance and achieve a higher reward, up to 20%, compared to the conventional model-free DDPG approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call