Abstract
Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.
Highlights
W ITH the continuous development of various intelligent devices such as smart vehicles, smart home appliances, Manuscript received June 15, 2020; revised September 12, 2020 and December 15, 2020; accepted December 15, 2020
Since the success probability is difficult to analyze with respect to the Poisson point process (PPP) of base stations (BS) ΦBS and the SINR model in (4), we focus on analyzing another point process with a more tractable SINR model as long as both the point processes have statistically equivalency, which is defined as follows
Q-learning algorithm is run for finite states finite policies (FSFP) scenarios with the parameters given as follows: number of popularity profiles in the finite set {p ∈ P}, |P| = 8, the cardinality of the set of caching probabilities |A| = 32, content library size F = 1024, cache size L = 32, decay factor β = 0.1, learning rate β1 = 0.7, the number of steps per episode is 103 and maximum number of episodes is 100
Summary
W ITH the continuous development of various intelligent devices such as smart vehicles, smart home appliances, Manuscript received June 15, 2020; revised September 12, 2020 and December 15, 2020; accepted December 15, 2020.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.