Multi-access Edge Computing and ubiquitous smart devices help serve end-users efficiently by providing emerging edge-deployed services. On the other hand, more heavy and time-varying traffic loads are generated in mobile edge networks, so that an efficient traffic forwarding mechanism is highly required to handle the routing problem in complex and highly dynamic edge environments. Thus, Deep Reinforcement Learning (DRL) is introduced since it can work in a model-free approach. However, previous centralized DRL-based methods work in a turn-based way that mismatches the real-time property of routing. In this paper, we propose a real-time and distributed learning approach, RTHop, to adapt to the volatile environment and realize a hop-by-hop routing. The Multi-Agent Deep Reinforcement Learning (MADRL) and the Real-Time Markov Decision Process (RTMDP) are used to alleviate network congestion and maximize the utilization of network resources. By joining with the self-attention mechanism, RTHop obtains the semantics from elements of the network state to help agents learn the importance of each element on routing. Experiment results show that RTHop not only overcomes the weakness of conventional turn-based DRL methods but also achieves the increase of delivered packet ratios and effective throughput compared with other routing methods.
Read full abstract