In this letter, we focus on the design of a relay selection scheme in large-scale energy-harvesting wireless sensor networks. Considering the dynamic nature required for practical networks, multiple features, including queuing state, energy level, channel quality, and location, are involved in modeling. By introducing reinforcement learning techniques, a novel relay selection scheme based on an actor-critic algorithm with linear function approximation is proposed to improve network reliability while also taking into account the transmission delay and energy efficiency. The proposed scheme can be implemented independently in each source to maximize the data delivery ratio. Such a distributed scheme is more stable and scalable than centralized structures. The Lagrangian formula is applied to satisfy the constraints on hops and energy efficiency. Compared with traditional timer-based and Q-learning-based schemes, our simulation results show that our proposed scheme achieves good performance in terms of network reliability and obtains higher energy efficiency and lower transmission delay.
Read full abstract