This paper investigates Dynamic Spectrum Access (DSA) paradigm with imperfect feedback for multiuser wireless network. Each user selects an orthogonal channel in particular time slot to transmit packet with a certain transmission probability. In next time slot, the user who has transmitted a packet receives an ACK signal based on local observation. Bearing in mind the dynamic nature of wireless networks, it is appealing to develop a blended strategy to perform effective DSA. This paper aims to design a distributed Deep Reinforcement Learning (DRL) based scheme with an objective of maximizing network utility. In conventional DRL framework, it is assumed that the feedback received (ACK packet) is always correct but in wireless networks, it may be lost or corrupted due to noise. Furthermore, it is challenging to promise that the multiple agents will cooperate to make coherent decisions in order to accomplish the same objective, particularly under imperfect feedback. To tackle these challenges, this work proposes (i) Deep Recurrent Reinforcement Learning network with integrated GRU layer to optimize the network utility function and (ii) a feedback recovery mechanism using complete and incomplete replay buffers. Extensive simulations corroborate the success of proposed scheme in complex multiuser scenario and exhibit robustness against the detrimental effects of the imperfect feedback.
Read full abstract