The increasing complexity and quantity of the Industrial Internet of Things (IIoT) pose new challenges to the traditional routing protocol for low-power and lossy networks (RPL) in terms of dynamic management, data transmission reliability, and energy efficiency optimization. This paper proposes a scalable deep reinforcement learning (DRL) algorithm with a multi-attention actor double critic model for routing optimization (MADC) to meet the requirements of IIoT for efficient and intelligent routing decisions while improving data transmission reliability and energy efficiency. Specifically, MADC employs the centralized training and decentralized execution (CTDE) learning paradigm to decouple the model’s training and inference tasks, which reduces the difficulty and computational cost of model learning and improves the training efficiency. In addition, a lightweight actor network based on multi-scale convolutional attention mechanism is designed in MADC, which can provide intelligent and real-time decision-making capabilities for resource-constrained nodes with low computational and storage complexities. Moreover, a scalable critic network utilizing multiple attention mechanisms is proposed. It is not only suitable for dynamic and changing network environments but also can more comprehensively and accurately evaluate local observation states, providing more accurate and efficient guidance for model optimization. Furthermore, MADC incorporates a double critic network architecture to mitigate potential overestimation issues during training, thereby ensuring the model’s robustness and reliability. Simulation results demonstrate that MADC outperforms existing RPL optimization algorithms in terms of energy efficiency, data transmission reliability, and adaptability.
Read full abstract