In unmanned aerial vehicle (UAV)-assisted device-to-device (D2D) caching networks, the uncertainty from unpredictable content demands and variable user positions poses a significant challenge for traditional optimization methods, often making them impractical. Multi-agent deep reinforcement learning (MADRL) offers significant advantages in optimizing multi-agent system decisions and serves as an effective and practical alternative. However, its application in large-scale dynamic environments is severely limited by the curse of dimensionality and communication overhead. To resolve this problem, we develop a scalable heterogeneous multi-agent mean-field actor-critic (SH-MAMFAC) framework. The framework treats ground users (GUs) and UAVs as distinct agents and designs cooperative rewards to convert the resource allocation problem into a fully cooperative game, enhancing global network performance. We also implement a mixed-action mapping strategy to handle discrete and continuous action spaces. A mean-field MADRL framework is introduced to minimize individual agent training loads while enhancing total cache hit probability (CHP). The simulation results show that our algorithm improves CHP and reduces transmission delay. A comparative analysis with existing mainstream deep reinforcement learning (DRL) algorithms shows that SH-MAMFAC significantly reduces training time and maintains high CHP as GU count grows. Additionally, by comparing with SH-MAMFAC variants that do not include trajectory optimization or power control, the proposed joint design scheme significantly reduces transmission delay.
Read full abstract