Unmanned aerial vehicle (UAV) is regarded as an effective technology in future wireless networks. However, due to the non-convexity feature of joint trajectory design and power allocation (JTDPA) issue, it is challenging to attain the optimal joint policy in multi-UAV networks. In this article, a multi-agent deep reinforcement learning-based approach is presented to achieve the maximum long-term network utility while satisfying the user equipments' quality of service requirements. Moreover, considering that the utility of each UAV is determined based on the network environment and other UAVs' actions, the JTDPA problem is modeled as a stochastic game. Due to the high computational complexity caused by the continuous action space and large state space, a multi-agent deep deterministic policy gradient method is proposed to obtain the optimal policy for the JTDPA issue. Numerical results indicate that our method can obtain the higher network utility and system capacity than other optimization methods in multi-UAV networks with lower computational complexity.
Read full abstract