Abstract

Relay selection solutions for underwater acoustic cooperative networks suffer significant performance degradation as they fail to adapt to incomplete information, noisy interference and overwhelming dynamics. To address this challenge, a hierarchical adversarial multi-armed bandit learning framework by proposing an online reward estimation layer is designed to improve adaptive relay decision control. In online reward estimation layer, adaptive Kalman filter estimator is developed to properly handle noisy observation to support accurate reward. Meanwhile, an online predict mechanism is projected for all relays to enrich learning information. Furthermore, based on estimate error variance, an adaptive exploration structure is developed to accelerate the balance between exploration and exploitation. All gathered information are exploited to learn relay quality for the decision-making. Accordingly, we present a Hierarchical Adversarial Bandit Learning (HABL) algorithm to fully exploit the heuristic interaction between the hierarchical framework. HABL integrates reward estimation, information prediction, adaptive exploration and decision making carefully in a holistic algorithm to maximize the learning efficiency. Thereby, the HABL-based relay selection algorithm has higher system throughput and lower communication cost. Further, we rigorously analyze the convergence of HABL algorithm and give its upper bound on the cumulative regret. Finally, extensive simulations elucidate the effectiveness of the HABL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call