We propose a learning-based scheme to investigate the dynamic multi-channel access (DMCA) problem in the fifth generation (5G) and beyond networks with fast time-varying channels wherein the channel parameters are unknown. The proposed learning-based scheme can maintain near-optimal performance for a long time, even in the sharp changing channels. This scheme greatly reduces processing delay, and effectively alleviates the error due to decision lag, which is cased by the non-immediacy of the information acquisition and processing. We first propose a psychology-based personalized quality of service model after introducing the network model with unknown channel parameters and the streaming model. Then, two access criteria are presented for the living streaming model and the buffered streaming model. Their corresponding optimization problems are also formulated. The optimization problems are solved by learning-based DMCA scheme, which combines the recurrent neural network with deep reinforcement learning. In the learning-based DMCA scheme, the agent mainly invokes the proposed prediction-based deep deterministic policy gradient algorithm as the learning algorithm. As a novel technical paradigm, our scheme has strong universality, since it can be easily extended to solve other problems in wireless communications. The real channel data-based simulation results validate that the performance of the learning-based scheme approaches that derived from the exhaustive search when making a decision at each time-slot, and is superior to the exhaustive search method when making a decision at every few time-slots.
Read full abstract