Edge servers, which are located in close proximity to mobile users, have become emerging components for computation offloading in multiple Internet of Things (IoT) applications. As the edge resources are limited and shared among multiple mobile users, it is crucial for the users to choose appropriate edge server for task offloading, so that their cumulative utility can be maximized. Reinforcement learning (RL) algorithms, which are sequential and model-free, have been widely considered. However, it is still a critical challenge to coordinate the mobile users in a decentralized way. In this work, we propose a novel framework of Multiagent RL by learning to coordinate. The main idea is to introduce an additional “virtual” agent at the edge, which learns to broadcast public messages to the mobile users at each interval. We then enforce positive correlation between each user’s offloading policy and the message. The underlying intuition is that the message can contain information of edge resources and other users’ policies. Therefore, it is expected that the decentralized users can make coordinated decisions. Theoretical analysis shows that our algorithm can converge to equilibrium points under certain mild assumptions. In the experiments, our approach outperforms other baselines significantly in different scenarios. In addition, the results show that the broadcast message plays a very important role in coordinating the mobile users.
Read full abstract