Abstract

This paper investigates the reinforcement learning for the relay selection in the delay-constrained buffer-aided networks. The buffer-aided relay selection significantly improves the outage performance but often at the price of higher latency. On the other hand, modern communication systems such as the Internet of Things often have strict requirement on the latency. It is thus necessary to find relay selection policies to achieve good throughput performance in the buffer-aided relay network while stratifying the delay constraint. With the buffers employed at the relays and delay constraints imposed on the data transmission, obtaining the best relay selection becomes a complicated high-dimensional problem, making it hard for the reinforcement learning to converge. In this paper, we propose the novel decision-assisted deep reinforcement learning to improve the convergence. This is achieved by exploring the a-priori information from the buffer-aided relay system. The proposed approaches can achieve high throughput subject to delay constraints. Extensive simulation results are provided to verify the proposed algorithms.

Highlights

  • W ITH the development of 5G communications, the Internet of Things (IoT) is becoming an increasingly growing topic in the area of wireless networks [1]–[3]

  • It is known that the relay selection is an efficient way to harvest the diversity gains

  • In [8], a relay selection scheme combined with feedback and adaptive forwarding in cooperative networks was studied

Read more

Summary

INTRODUCTION

W ITH the development of 5G communications, the Internet of Things (IoT) is becoming an increasingly growing topic in the area of wireless networks [1]–[3]. The novel decision-assistant deep reinforcement learning is proposed to improve the convergence This is achieved by exploring the a-priori information from the buffer-aided relay system. Two deep reinforcement algorithms, namely the decisionassisted deep Q-learning and Sarsa respectively, are proposed for the buffer-aided relay selection subject to instantaneous delay constraints. This is different from existing buffer-aided schemes which usually consider average packet delay. For moderate delay constraints, the decision-assisted deep Sarsa can achieve the highest possible throughput in the two-hop relay network, making it a very attractive scheme in practice. The rest of the paper is organized as follows: Section II describes the system model; Section III formulates the problem of the optimum relay selection in the delay-constrained bufferaided relay network; Section IV defines the elements of the reinforcement learning for the relay selection; Section V describes the deep reinforcement learning for the relay selection; Section VI proposes the decision-assisted deep reinforcement learning explores the a-priori information in the relay; Section VII verifies the proposed algorithms with simulation; Section VIII concludes the paper

SYSTEM MODEL
PROBLEM FORMULATION
Environment and action
Rewards
DECISION-ASSISTED DEEP REINFORCEMENT
Punishment with negative rewards
Decision assisted learning
Repeat
20. Form the cost function as
Simulation setup
Simulation results
Findings
VIII. CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.