Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design

Xiao Liu,Yue Chen,Yuanwei Liu

doi:10.1109/tvt.2019.2922849

Abstract

A novel framework is proposed for quality of experience driven deployment and dynamic movement of multiple unmanned aerial vehicles (UAVs). The problem of joint non-convex three-dimensional (3-D) deployment and dynamic movement of the UAVs is formulated for maximizing the sum mean opinion score of ground users, which is proved to be NP-hard. In the aim of solving this pertinent problem, a three-step approach is proposed for attaining 3-D deployment and dynamic movement of multiple UAVs. First, a genetic algorithm based K-means (GAK-means) algorithm is utilized for obtaining the cell partition of the users. Second, Q-learning based deployment algorithm is proposed, in which each UAV acts as an agent, making their own decision for attaining 3-D position by learning from trial and mistake. In contrast to the conventional genetic algorithm based learning algorithms, the proposed algorithm is capable of training the direction selection strategy offline. Third, Q-learning based movement algorithm is proposed in the scenario that the users are roaming. The proposed algorithm is capable of converging to an optimal state. Numerical results reveal that the proposed algorithms show a fast convergence rate after a small number of iterations. Additionally, the proposed Q-learning based deployment algorithm outperforms K-means algorithms and Iterative-GAKmean algorithms with low complexity.

Full Text