Abstract
Unmanned aerial vehicle (UAV) is regarded as an effective technology in future wireless networks. However, due to the non-convexity feature of joint trajectory design and power allocation (JTDPA) issue, it is challenging to attain the optimal joint policy in multi-UAV networks. In this article, a multi-agent deep reinforcement learning-based approach is presented to achieve the maximum long-term network utility while satisfying the user equipments' quality of service requirements. Moreover, considering that the utility of each UAV is determined based on the network environment and other UAVs' actions, the JTDPA problem is modeled as a stochastic game. Due to the high computational complexity caused by the continuous action space and large state space, a multi-agent deep deterministic policy gradient method is proposed to obtain the optimal policy for the JTDPA issue. Numerical results indicate that our method can obtain the higher network utility and system capacity than other optimization methods in multi-UAV networks with lower computational complexity.
Highlights
Unmanned aerial vehicles (UAVs) have been regarded as an important technology in the future wireless networks [1]
Simulation results indicate that the multi-agent deep deterministic policy gradient (MADDPG) scheme can improve the system capacity and network utility by over 15% with lower computational cost in multi-UAV networks, compared with the other learning optimization approaches
In multi-UAV networks, to ensure that all user equipments (UEs) achieve the quality of service (QoS) requirements from the connected UAVs, the SINR φi,m(t) of UE m should be not less than the minimum QoS requirement m, which can be defined as φi,m(t) ≥ m
Summary
Unmanned aerial vehicles (UAVs) have been regarded as an important technology in the future wireless networks [1]. The problem of trajectory design, power allocation, and interference management should be studied jointly in multi-UAV networks. In this work, we propose a reinforcement learning (RL) method to tackle the JTDPA optimization problem in the multi-UAV networks. Our previous work proposed a DRL approach for trajectory design and power allocation in UAV networks [22]. Most of these centralized methods may achieve an expensive computational complexity. In our previous work [24], an multi-agent dueling-double deep Q-network method was investigated to tackle the joint user association and resource allocation problem. An MADRL method is introduced to tackle the JTDPA optimization problem in multi-UAV networks. M=1 where ρi represents the profit per rate, λp is the cost of UAV’s transmit power
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.