A Bidirectional Q-learning Routing Protocol for UAV Networks

Junyu Zhou,Jing Liu,Wudong Shi,Bin Xia

doi:10.1109/wcsp52459.2021.9613295

Abstract

Unmanned aerial vehicle (UAV) networks are widely used in military and civil fields. Ad hoc On-Demand Distance Vector Routing(AODV) has been proved to be an effective routing protocol in ad hoc networks, but it is still a big challenge to provide reliable communication in UAV networks with high mobility and limited wireless resource. Q-Learning can be used to optimize routing protocols, but some problems such as local optimum, blind exploration and slow convergence speed exist. To solve these problems, this paper proposes a routing protocol for UAV networks based on Bidirectional Q-Learning(BQLAODV), which takes node mobility and network load into account on routing, and improves iteration speed and calculation accuracy of Q-value. The bidirectional learning mechanism, which updates the Q-value to both source and destination, is introduced to accelerate the iteration speed of Q-value, and the information of two-hop neighbors is collected to improve the calculation accuracy of Q-value. Routing exploration based on Q-value has an important impact on routing performance. The Hierarchical routing exploration algorithm, which divides the nodes into three groups according to Q-value, is adopted to reduce the blindness of random exploration. Simulation results show that compared with AODV and Q-learning routing protocol (QLAODV), BQLAODV performs better in UAV networks.

Full Text