In the Industrial Internet of Things, devices with limited computing power and energy storage often rely on offloading tasks to edge servers for processing. However, existing methods are plagued by the high cost of device communication and unstable training processes. Consequently, Deep reinforcement learning (DRL) has emerged as a promising solution to tackle the computation offloading problem. In this paper, we propose a framework called multi-agent twin delayed shared deep deterministic policy gradient algorithm (MASTD3) based on DRL. Firstly, we formulate the task offloading conundrum as a long-term optimization problem, which aids in mitigating the challenge of deciding between local or remote task execution by a device, leading to more effective task offloading management. Secondly, we enhance MASTD3 by introducing a priority experience replay buffer mechanism and a model sample replay buffer mechanism, thus improving sample utilization and overcoming the cold-start problem associated with long-term optimization. Moreover, we refine the actor-critic structure, enabling all agents to share the same critic network. This modification accelerates convergence speed during the training process and reduces computational costs during runtime. Finally, experimental results demonstrate that MASTD3 effectively addresses the proportional offloading problem, which is optimized by 44.32%, 29.26%, and 17.47% compared to DDPQN, MADDPG, and FLoadNet.
Read full abstract