To address the problems of unclear node activation strategy and redundant feasible solutions in solving the target coverage of wireless sensor networks, a target coverage algorithm based on deep Q-learning is proposed to learn the scheduling strategy of nodes for wireless sensor networks. First, the algorithm abstracts the construction of feasible solutions into a Markov decision process, and the smart body selects the activated sensor nodes as discrete actions according to the network environment. Second, the reward function evaluates the merit of the smart body’s choice of actions in terms of the coverage capacity of the activated nodes and their residual energy. The simulation results show that the proposed algorithm intelligences are able to stabilize their gains after 2500 rounds of learning and training under the specific designed states, actions and reward mechanisms, corresponding to the convergence of the proposed algorithm. It can also be seen that the proposed algorithm is effective under different network sizes, and its network lifetime outperforms the three greedy algorithms, the maximum lifetime coverage algorithm and the self-adaptive learning automata algorithm. Moreover, this advantage becomes more and more obvious with the increase in network size, node sensing radius and carrying initial energy.
Read full abstract