Abstract

Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.

Highlights

  • W ITH the continuous development of various intelligent devices such as smart vehicles, smart home appliances, Manuscript received June 15, 2020; revised September 12, 2020 and December 15, 2020; accepted December 15, 2020

  • Since the success probability is difficult to analyze with respect to the Poisson point process (PPP) of base stations (BS) ΦBS and the SINR model in (4), we focus on analyzing another point process with a more tractable SINR model as long as both the point processes have statistically equivalency, which is defined as follows

  • Q-learning algorithm is run for finite states finite policies (FSFP) scenarios with the parameters given as follows: number of popularity profiles in the finite set {p ∈ P}, |P| = 8, the cardinality of the set of caching probabilities |A| = 32, content library size F = 1024, cache size L = 32, decay factor β = 0.1, learning rate β1 = 0.7, the number of steps per episode is 103 and maximum number of episodes is 100

Read more

Summary

Introduction

W ITH the continuous development of various intelligent devices such as smart vehicles, smart home appliances, Manuscript received June 15, 2020; revised September 12, 2020 and December 15, 2020; accepted December 15, 2020.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.