Abstract
Deep reinforcement learning (DRL) has been successfully used for the joint routing and resource management in large-scale cognitive radio networks. However, it needs lots of interactions with the environment through trial and error, which results in large energy consumption and transmission delay. In this paper, an apprenticeship learning scheme is proposed for the energy-efficient cross-layer routing design. Firstly, to guarantee energy efficiency and compress huge action space, a novel concept called dynamic adjustment rating is introduced, which regulates transmit power efficiently with multi-level transition mechanism. On top of this, the Prioritized Memories Deep Q-learning from Demonstrations (PM-DQfD) is presented to speed up the convergence and reduce the memory occupation. Then the PM-DQfD is applied to the cross-layer routing design for power efficiency improvement and routing latency reduction. Simulation results confirm that the proposed method achieves higher energy efficiency, shorter routing latency and larger packet delivery ratio compared to traditional algorithms such as Cognitive Radio Q-routing (CRQ-routing), Prioritized Memories Deep Q-Network (PM-DQN), and Conjecture Based Multi-agent Q-learning Scheme (CBMQ).
Highlights
Spectrum shortage has been one of the major bottlenecks for the development of wireless communication systems [1]
All of the nodes share the same frequency bands, whereas in multi-hop Cognitive RadioNetworks (CRN), each node may work with different available channel sets based on the uncertainty of Primary Users’ (PUs’)
The of demonstration data and the dynamic change change of the self-generated datadata savesave thethe storage decrease of demonstration data and the dynamic of the self-generated storagespace, which can be used for storing more cache data or sensor data with higher precision
Summary
Spectrum shortage has been one of the major bottlenecks for the development of wireless communication systems [1]. We consider a scenario in which there exists an expert source node that has already learned stable strategies On this basis, new data flows are derived from another source node, and a small number of intermediate nodes change their location in the network. If the new source node lies in a remote area of the network, the expert’s strategies in the state of the new source node may be imperfect, since few data packets are relayed by it. It will take a long time and extensive energy if the self-learning scheme is adopted in this case [26,27]. Unlike the general learning scheme, DQfD allows the new source node to learn from the mature experience of the expert source node
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.