Medley deep reinforcement learning-based workload offloading and cache placement decision in UAV-enabled MEC networks

Hongchang Ke,Hui Wang,Hongbin Sun

doi:10.1007/s40747-023-01318-7

Abstract

Internet of Things devices generate a large number of heterogeneous workloads in real-time that require specific application to tackle, and the inability to communicate between devices and communication base stations due to complex scenarios is a thorny issue. Service caching play a key role in managing specific-request workload from devices, and unmanned aerial vehicles with computation and communication functions can effectively solve communication barrier between devices and ground base stations. In addition, the joint optimization of workload offloading and service cache placement is a key issue. Accordingly, we design an unmanned aerial vehicle-enabled mobile edge computing system with multiple devices, unmanned aerial vehicles and edge servers. The proposed framework takes into account the randomness of workload arrival, the time-varying nature of channel states, the limitations of the hosting service caching, and wireless communication blocking. Furthermore, we designed workload offloading and service caching hosting decision-making optimization problems to minimize the long-term weighted average latency and energy consumption costs. To tackle this joint optimization problem, we propose a request-specific workload offloading and service caching decision-making scheme based on the medley deep reinforcement learning scheme. To this end, the proposed scheme is decomposed into two-stage optimization subproblems: the workload offloading decision-making problem and the service caching hosting selection problem. In terms of the first subproblem, we model each device as a learning agent and propose the workloads offloading decision-making scheme based on multi-agent deep deterministic policy gradient. For the second subproblem, we present the decentralized double deep Q-learning scheme to tackle the service caching hosting policy. According to the comprehensive experimental results, the proposed scheme is able to converge rapidly on various parameter configurations and whose performance surpasses the other four baseline learning algorithms.

Full Text