Abstract

An efficient multi-hop relay selection method is the key of multi-hop relay technology to improve wireless communication reliability. Accordingly, this paper devotes a multi-hop relay selection problem for unknown time-varying underwater acoustic sensor networks, and a dynamic combinatorial multi-armed bandit (DCMAB) learning structure is proposed to achieve the multi-hop relay strategy with minimum propagation delay without any prior channel information. Compared with the strategy learning space of the single relay selection problem for static networks, the multi-hop relay learning space shows high-dimensional and dynamic characteristics. To cope with the high-dimensional characteristic of multi-hop relay strategy spaces, DCMAB develops a combinatorial bandit learning manner. It enables the player to learn the high-dimensional multi-hop relay strategy space by exploring the low-dimensional link sub-strategy space, thereby reducing the learning complexity. To cope with the dynamic characteristic of multi-hop relay strategy spaces, DCMAB makes newly-formed links able to employ the historical learning information of experienced links to reason their prior knowledge. Meanwhile, by adopting a probabilistic compensation manner, DCMAB intensifies the exploration for newly-formed links. It successfully overcomes learning inefficiency caused by the lack of learning information on newly-formed links. Besides, an energy-aware-based filtering mechanism is proposed to filter out potential long-delay relay links. It enables the player to focus on exploring and reasoning high-quality links, thereby enhancing the quick search ability of superior multi-hop relay strategies. Finally, the superiority of the proposed algorithm is demonstrated by extensive simulation results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call