The rapid expansion of the Internet of Things (IoT) necessitates advanced routing schemes capable of meeting the stringent demands for low latency and high accuracy, which are critical for applications such as autonomous vehicles and telemedicine. Traditional edge computing methods often struggle with elevated latency, rendering them unsuitable for time-sensitive applications. Additionally, many reinforcement learning (RL) algorithms require action space discretization, which can introduce biases and dimensionality challenges. This paper introduces a novel Soft Actor-Critic (SAC)-based distributed routing scheme for edge computing, specifically designed to address these limitations. By integrating RL with Maximum Entropy principles and employing a decentralized approach, the proposed model enhances network performance, reduces delays, and effectively manages multi-optimality criteria. The distributed routing scheme operates independently of a centralized controller, allowing routers to make autonomous decisions and adapt seamlessly to changes in the network. This is accomplished through a Markov Decision Process (MDP) that optimizes routing paths based on various factors, including node depth, energy consumption, and transmission probability. The methodology encompasses local training phases for individual nodes, followed by federated training to refine the model across the network. Experimental results conducted on topologies of varying scales demonstrate the model’s efficacy in achieving high accuracy and efficient convergence, particularly in dynamic IoT environments. These findings underscore the potential of the proposed SAC-based distributed routing scheme as a robust solution for enhancing routing efficiency and reliability in the evolving landscape of IoT applications. IMPACT STATEMENT The rapid expansion of Internet of Things (IoT) applications demands advanced routing solutions to ensure low latency and high accuracy, crucial for sectors like autonomous vehicles and telemedicine. Traditional edge computing methods struggle with elevated latency, while many reinforcement learning (RL) algorithms face challenges with action space discretization, leading to biases and dimensionality issues. This study introduces a novel Soft Actor-Critic (SAC)-based distributed routing scheme to address these limitations. Integrating Maximum Entropy principles with RL enhances exploration and decision-making stability. The decentralized approach allows routers to make autonomous, real-time decisions based on local network conditions, optimizing routing paths through a Markov Decision Process (MDP). Experimental results from various simulated IoT network topologies show the model’s superior performance in reducing delays and maintaining bandwidth. This research paves the way for more reliable, low-latency IoT applications, significantly enhancing routing efficiency and network adaptability in dynamic IoT environments.
Read full abstract