In the Energy-Harvesting Wireless Body Area Networks (EH-WBAN), one of the fundamental challenges is preserving the self-sustainability of sensors without compromising network reliability and connectivity. Determining the sleep/wake schedule of body nodes (BNs) is an efficient way to achieve self-sustainability. Sleeping nodes should be connected to at least one active node to reduce delay and keep the network connected. There are two fundamental problems with previous methods for determining BN's sleep/wake schedule: (1) BN suffers from emergency packet loss and unnecessary frequent sleeping and waking up, and (2) They do not guarantee network connectivity. Studies that have only examined connectivity in EH-WBAN also have two main issues: (1) BNs are considered homogenous in terms of energy harvesting and its consumption, (2) These methods cannot adapt to the time-varying behavior of energy-harvesting resources. This study proposes a new method for sleep/wake scheduling called Reinforcement Learning-based Sleep Scheduling (RLS2). RLS2 has the following innovative points: (1) To avoid emergency packet loss or unnecessary frequent sleeping and waking up, each BN has its own sleep/wake schedule based on its energy level and sensed data changes, (2) Lowest possible number of BNs are determined as relay nodes in each round to increase network reliability and connectivity; these BNs remain active in each round, while the others operate according to the determined schedule. In this part of the proposed method: (1) Heterogeneous BNs are considered, (2) As a first step in solving adaptability, the problem of finding the optimal active groups is formulated as a Markov decision process (MDP), followed by a Q-learning algorithm capable of learning time-varying behavior of energy harvesting resources, (3) The unavailable action space is removed to reduce the problem's complexity, (4) To achieve good Q-learning performance, a reward function based on residual energy level and neighborhood degree of BNs is defined. It can find an active group with the lowest cardinality in the current round, which is maximum in terms of the residual energy of its sensors. The performed simulations indicate the appropriate convergence of the proposed method. The results show that, on average, the proposed method improves network connectivity and energy efficiency by 50% and 31%, respectively, and reduces network delay by 27%.
Read full abstract