Decentralized resource allocation in UAV communication networks through reward based multi agent learning
Unmanned aerial vehicles (UAVs) used as aerial base stations (ABS) can provide economical, on-demand wireless access. This research investigates dynamic resource allocation in multi-UAV-enabled communication systems with the aim of maximizing long-term rewards. More specifically, without exchanging information with other UAVs, every UAV chooses its communicating users, power levels, and sub-channels to establish communication with a ground user. In the proposed work, the dynamic scheme-based resource allocation is investigated of communication networks made possible by many UAVs to achieve the highest possible performance level over time. Specifically, each UAV selects its connected users, battery power, and communication channel independently, without exchanging information across multiple UAVs. This allows each UAV to connect with ground users. To model the unpredictability of the environment, we present the problem of long-term allocation of system resources as a stochastic game to maximize the anticipated reward. Each UAV in this game plays the role of a learnable agent, and the system solution for resource allocation matches the actions made by the UAV. Afterward, we built a framework called reward-based multi-agent learning (RMAL), in which each agent uses learning to identify its best strategies based on local observations. RMAL is an acronym for ″reward-based multi-agent learning″. We specifically offer an agent-independent strategy where each agent decides algorithms separately but cooperates on a common Q-learning-based framework. The performance of the suggested RMAL-based resource allocation method may be enhanced by employing the right development and exploration parameters, according to the simulation findings. Secondly, the proposed RMAL algorithm provides acceptable performance over full information exchange between UAVs. Doing so achieves a satisfactory compromise between the increase in performance and the additional burden of information transmission.
394
- 10.1109/twc.2019.2892131
- Feb 1, 2019
- IEEE Transactions on Wireless Communications
1283
- 10.1109/tcomm.2016.2611512
- Dec 1, 2016
- IEEE Transactions on Communications
6
- 10.1007/s00500-024-09691-2
- Jul 1, 2024
- Soft Computing
1120
- 10.1109/lcomm.2016.2578312
- Jun 6, 2016
- IEEE Communications Letters
30
- 10.1145/3649224
- May 20, 2024
- ACM Journal on Autonomous Transportation Systems
42
- 10.1109/access.2018.2811372
- Jan 1, 2018
- IEEE Access
1824
- 10.1109/twc.2017.2789293
- Mar 1, 2018
- IEEE Transactions on Wireless Communications
8
- 10.1109/taes.2023.3300813
- Jun 1, 2024
- IEEE Transactions on Aerospace and Electronic Systems
98
- 10.1109/tvt.2012.2211905
- Jan 1, 2013
- IEEE Transactions on Vehicular Technology
2
- 10.3390/s24206535
- Oct 10, 2024
- Sensors (Basel, Switzerland)
- Research Article
435
- 10.1109/twc.2019.2935201
- Aug 29, 2019
- IEEE Transactions on Wireless Communications
Unmanned aerial vehicles (UAVs) are capable of serving as aerial base stations (BSs) for providing both cost-effective and on-demand wireless communications. This article investigates dynamic resource allocation of multiple UAVs enabled communication networks with the goal of maximizing long-term rewards. More particularly, each UAV communicates with a ground user by automatically selecting its communicating users, power levels and subchannels without any information exchange among UAVs. To model the uncertainty of environments, we formulate the long-term resource allocation problem as a stochastic game for maximizing the expected rewards, where each UAV becomes a learning agent and each resource allocation solution corresponds to an action taken by the UAVs. Afterwards, we develop a multi-agent reinforcement learning (MARL) framework that each agent discovers its best strategy according to its local observations using learning. More specifically, we propose an agent-independent method, for which all agents conduct a decision algorithm independently but share a common structure based on Q-learning. Finally, simulation results reveal that: 1) appropriate parameters for exploitation and exploration are capable of enhancing the performance of the proposed MARL based resource allocation algorithm; 2) the proposed MARL algorithm provides acceptable performance compared to the case with complete information exchanges among UAVs. By doing so, it strikes a good tradeoff between performance gains and information exchange overheads.
- Conference Article
16
- 10.1109/iccs.2018.8689218
- Dec 1, 2018
This paper considers a unmanned aerial vehicle (UAV)-enabled cellular network, in which multiple UAVs are deployed as aerial base stations (BSs) to serve users distributed on the ground. Different from prior works that ignore UAVs’ backhaul connections, we practically consider that these UAVs are connected to the core network through a ground gateway node via rate-limited multi-hop wireless backhauls. We also consider that the air-to-ground (A2G) access links from UAVs to users and the air-to-air (A2A) backhaul links among UAVs are operated over orthogonal frequency bands. Under this setup, we aim to maximize the common (or minimum) throughput among all the ground users in the downlink of this network subject to the flow conservation constraints at the UAVs, by optimizing the UAVs’ deployment locations, jointly with the bandwidth and power allocation of both the access and backhaul links. However, the common throughput maximization is a non-convex optimization problem that is difficult to be solved optimally. To tackle this issue, we use the techniques of alternating optimization and successive convex programming (SCP) to obtain a locally optimal solution. Numerical results show that the proposed design significantly improves the common throughput among all ground users as compared to other benchmark schemes.
- Research Article
54
- 10.1109/mwc.001.2000174
- Nov 14, 2020
- IEEE Wireless Communications
In this article, we propose artificial intelligence (AI) enabled unmanned aerial vehicle (UAV) aided wireless networks (UAWN) for overcoming the challenges imposed by the random fluctuation of wireless channels, blocking and user mobility effects. In UAWN, multiple UAVs are employed as aerial base stations, which are capable of promptly adapting to the randomly fluctuating environment by collecting information about the users' position and tele-traffic demands, learning from the environment and acting upon the satisfaction level feedback received from the users. Moreover, AI enables the interaction among a swarm of UAVs for cooperative optimization of the system. As a benefit of the AI framework, several challenges of conventional UAWN may be circumvented, leading to enhanced network performance, improved reliability and agile adaptivity. As a further benefit, dynamic trajectory design and resource allocation are demonstrated. Finally, potential research challenges and opportunities are discussed.
- Conference Article
16
- 10.1109/iccchina.2017.8330382
- Oct 1, 2017
This paper proposes a distributed multiple relay selection scheme to maximize the satisfaction experiences of unmanned aerial vehicles (UAV) communication networks. The multi-radio and multi-channel (MRMC) UAV communication system is considered in this paper. One source UAV can select one or more relay radios, and each relay radio can be shared by multiple source UAVs equally. Without the center controller, source UAVs with heterogeneous requirements compete for channels dominated by relay radios. In order to optimize the global satisfaction performance, we model the UAV communication network as a many-to-many matching market without substitutability. We design a potential matching approach to address the optimization problem, in which the optimizing of local matching process will lead to the improvement of global matching results. Simulation results show that the proposed distributed matching approach yields good matching performance of satisfaction, which is close to the global optimum result. Moreover, the many-to-many potential matching approach outperforms existing schemes sufficiently in terms of global satisfaction within a reasonable convergence time.
- Research Article
71
- 10.1007/s41650-018-0040-3
- Dec 1, 2018
- Journal of Communications and Information Networks
Unmanned aerial vehicles (UAVs) have emerged as a promising solution to provide wireless data access for ground users in various applications (e.g., in emergence situations). This paper considers a UAV-enabled wireless network, in which multiple UAVs are deployed as aerial base stations (BSs) to serve users distributed on the ground. Different from prior works that ignore UAV's backhaul connections, we practically consider that these UAVs are connected to the core network through a ground gateway node via rate-limited multi-hop wireless backhauls. We also consider that the air-to-ground (A2G) access links from UAVs to users and the air-to-air (A2A) backhaul links among UAVs are operated over orthogonal frequency bands. Under this setup, we aim to maximize the common (or minimum) throughput among all the ground users in the downlink of this network subject to the flow conservation constraints at the UAVs, by optimizing the UAVs' deployment locations, jointly with the bandwidth and power allocation of both the access and backhaul links. However, the common throughput maximization is a non-convex optimization problem that is difficult to be solved optimally. To tackle this issue, we use the techniques of alternating optimization and successive convex programming (SCP) to obtain a locally optimal solution. Numerical results show that the proposed design significantly improves the common throughput among all ground users as compared to other benchmark schemes.
- Research Article
32
- 10.1109/tvt.2022.3189552
- Nov 1, 2022
- IEEE Transactions on Vehicular Technology
As an aerial base station, unmanned aerial vehicle (UAV) has been considered as a promising technology to assist future wireless communications due to its flexible, swift and low cost features, where resource allocation is the basis for ensuring energy-efficient UAV-assisted networks. This paper formulates a joint optimization problem of user association, UAV trajectory design and power control to maximize the channel capacity among all ground users at a limited power level in a downlink transmission. To tackle the mixed-integer non-linear programming problem, this paper proposes a clustering-aided reinforcement learning approach consisting of three consecutive stages. Firstly, modified expectation-maximization unsupervised learning algorithm is investigated to cluster the ground users, which reduces the dimensions and hence, the association complexity is reduced as well. Then, Kuhn-Munkres algorithm is incorporated for user association, which associates a UAV with the ground users via matching to the cluster, and assigns the UAVs to the centroid of the matching cluster for pre-placement, with the aim of speeding up the convergence of the following deep reinforcement learning algorithm. Finally, a multi-agent twin delayed deep deterministic (MATD3) policy gradient is proposed to solve the non-convex sub-problem, which determines the transmit power and designs the fine-tuned trajectory of UAVs. By incorporating low-bias value estimation technique, the reward of the proposed MATD3 algorithm is improved. Simulation results have demonstrated that our proposed approach achieves higher reward as well as converging faster than existing reinforcement algorithms. Besides, the clustering-aided reinforcement learning has lower computational complexity than the benchmark schemes.
- Research Article
68
- 10.1109/jiot.2021.3094651
- Feb 15, 2022
- IEEE Internet of Things Journal
In this article, we focus on a downlink cellular network, where multiple unmanned aerial vehicles (UAVs) serve as aerial base stations for ground users through frequency-division multiple access (FDMA). With user locations and channel parameters inaccessible, the UAVs coordinate to make a decision on resource allocation and trajectory design in a decentralized way. Aiming at optimizing both overall and fairness throughput, we model resource allocation and trajectory design as a decentralized partially observable Markov decision process (Dec-POMDP) and propose multiagent reinforcement learning (RL) as a solution. Specifically, we use parameterized deep <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -network (P-DQN) for the action space comprising both discrete and continuous actions and the QMIX framework is leveraged to aggregate each UAV’s local critics. For fairness throughput optimization, we introduce an entropy-like fairness indicator to the reward to make the total return decomposable. In addition, we further propose a novel distributed learning framework for overall throughput optimization such that each UAV can contribute its local gradient, and model training can be implemented in parallel without need of observation data sharing among the UAVs. Simulation results show that the proposed multiagent RL approach as well as the distributed learning framework are efficient in model training and present acceptable performance close to that achieved by deterministic optimization, which relies on convention optimization techniques with user locations and channel parameters explicitly known beforehand. For fairness throughput optimization, we also show that ground users achieve individual throughputs close to each other, which verifies the effectiveness of the proposed fairness indicator as the reward definition in the RL framework.
- Conference Article
10
- 10.1109/wcnc.2019.8886053
- Apr 1, 2019
The use of unmanned aerial vehicles (UAVs) as aerial wireless base stations has been recognized as an effective approach to on-demand deployment for providing services during a temporary event or emergency situation. High UAV mobility can be fully utilized to create line-of-sight connection and alleviate cross-link interference. While most of the prior works have studied UAV deployment, trajectory design and resource allocation strategies for improving the network throughput or energy efficiency, in this paper, we are interested in prolonging the lifetime of ground users for communications. The lifetime is defined as the communication time for the ground user before its battery is exhausted. We consider a frequency division multiplexing (FDM) uplink system where the ground users are served by multiple UAVs. We formulate a joint user association, power control, bandwidth allocation and UAV deployment problem for lifetime maximization, and propose an efficient approximation algorithm through judicious problem reformulation and successive convex approximation (SCA) techniques. For the scenario with only a single UAV, we show that the problem can be globally solved by simple bisection. Simulation results are presented to demonstrate that the proposed algorithms can achieve near-optimal performance and greatly outperform the heuristic methods.
- Conference Article
10
- 10.1109/iccw.2018.8403627
- May 1, 2018
This paper proposes an orthogonal frequency division multiplexing (OFDM) relaying wireless power transfer based protocol for energy-constrained unmanned aerial vehicle (UAV) communication network. At the first step of the proposed protocol, the energy-constrained UAV separates the received signals into two disjoint subcarrier groups to perform energy harvesting (EH) and information decoding (ID). At the second step, UAV forwards the received information signals with the energy harvested a priori. Based on the proposed protocol, this paper aims to find the joint optimization of subcarrier grouping and power allocation for maximizing the transmission rate under the EH constraint. We further solve such a joint resource allocation problem via dual decomposition after transforming it into an equivalent convex optimization problem. Simulation results indicate that the performance of our proposed protocol can be significantly improved for the OFDM relaying based UAV communication network.
- Research Article
1
- 10.3389/fcomp.2021.691854
- Jun 4, 2021
- Frontiers in Computer Science
In this study, we consider a security efficiency maximization problem in a multiple unmanned aerial vehicle (UAV)-aided system with mobile edge computing (MEC). Two kinds of UAVs, including multiple computing UAVs (CUAVs) and multiple jamming UAVs (JUAVs), are considered in this system. CUAVs would receive partial computation bits and send the computation results to ground users. JUAVs do not undertake computing tasks and only send interference signals to counter potential ground eavesdroppers. We jointly optimize the ground user scheduling, UAV power, and UAV trajectory to maximize the security efficiency. The original problem is non-convex and difficult to solve. We first use the Dinkelbach method combined with continuous convex approximation technology, and then propose three corresponding subproblems, including user scheduling subproblem, UAV power subproblem, and UAV trajectory problem. Further, we apply the branch and bound method to solve the user scheduling subproblem, and optimize the two remaining subproblems by introducing auxiliary variables and Taylor expansion. The simulation results show that the proposed scheme can obtain better secure off-loading efficiency with respect to the existing schemes.
- Conference Article
15
- 10.1117/12.2050765
- Jun 3, 2014
To date, Unmanned Aerial Vehicles (UAVs) have been widely used for numerous applications. UAVs can directly connect to ground stations or satellites to transfer data. Multiple UAVs can communicate and cooperate with each other and then construct an ad-hoc network. Multi-UAV systems have the potential to provide reliable and timely services for end users in addition to satellite networks. In this paper, we conduct a simulation study for evaluating the network performance of multi-UAV systems and satellite networks using the ns-2 networking simulation tool. Our simulation results show that UAV communication networks can achieve better network performance than satellite networks and with a lower cost and increased timeliness. We also investigate security resiliency of UAV networks. As a case study, we simulate false data injection attacks against UAV communication networks in ns-2 and demonstrate the impact of false data injection attacks on network performance.
- Research Article
69
- 10.1109/tcomm.2020.2983040
- Mar 27, 2020
- IEEE Transactions on Communications
In this paper, we study the resource allocation and trajectory design for secure unmanned aerial vehicle (UAV)-enabled communication systems, where multiple multi-purpose UAV base stations are dispatched to provide secure communications to multiple legitimate ground users (GUs) in the existence of multiple eavesdroppers (Eves). Specifically, by leveraging orthogonal frequency division multiple access (OFDMA), active UAV base stations can communicate to their desired ground users via the assigned subcarriers while idle UAV base stations can serve as jammer simultaneously for communication security provisioning. To achieve fairness in secure communication, we maximize the average minimum secrecy rate per user by jointly optimizing the communication/jamming subcarrier allocation policy and the trajectory of UAVs, while taking into account the constraints on the minimum safety distance among multiple UAVs, the maximum cruising speed, the initial/final locations, and the existence of cylindrical no-fly zones (NFZs). The design is formulated as a mixed integer non-convex optimization problem which is generally intractable. Subsequently, a computationally-efficient iterative algorithm is proposed to obtain a suboptimal solution. Simulation results illustrate that the performance of the proposed iterative algorithm can significantly improve the average minimum secrecy rate compared to various baseline schemes.
- Conference Article
4
- 10.2514/6.2008-6794
- Jun 15, 2008
In this paper we present results of a project addressing coordination & control of heterogeneous Unmanned Aerial Vehicles(UAV). Thedesign of thisHeterogeneous Cooperative Control (HCC) system achieves two objectives: autonomous vehicle-to-targetassignment UAVtrajectoryplanning in a dynamically varying environment to ensure simultaneous arrival of different UAVs to a target while avoiding collisions. As a means to accomplish these objectives, preliminary steps involved: (i) Developing a realistic mission scenario involving coordination and collaboration among multiple UAVs for Intelligence, Surveillance & Reconnaissance (ISR) and strike; (ii) Developing algorithms for sensor planning using quality of information (QoI) technique, vehicle-to-task assignment, and cooperative path planning under pop-up threats; and (iii) Developing a simulation and animation in Matlab extending our previous work to demonstrate the features of the integrated system. Simulation testing demonstrated excellent performance of the resulting coordinated control system for multiple heterogeneous UAVs. In this paper we present relevant results. I. Introduction As increasing number of UAVs are used in missions ranging from reconnaissance to strike, higher level of UAV autonomous control with heterogeneous teaming, distributed tasking, and cooperative tactics is required. These heterogeneous and cooperative control (HCC) technologies need to be developed to transform a single UAV operator into a multi-UAV supervisor. To this end, methodologies that enhance the autonomy of UAVs in response to changing environment and conditions must be explored and designed. In particular, techniques that assign tasks for UAVs of differing capabilities to achieve desired mission objectives through coordination and cooperation must be developed. These techniques should enable UAVs to adapt to configuration changes such as loss of UAVs and pop-up threats. Techniques that plan trajectories for UAVs to accomplish the assigned tasks while avoiding spatial obstacles such as no-fly-zone or pop-up threat zones and meeting time constraints such as shortest completion time or simultaneous arrival time must also be developed. Mission scenarios involving coordination and collaboration among multiple UAVs need be devised; control architectures, simulations, and demos must be developed to verify the HCC capabilities. In this study, heterogeneous and cooperative control technologies are developed for multiple UAVs of different capabilities to autonomously accomplish assigned missions through coordination and collaboration in the presence of spatial and temporal constraints.
- Research Article
93
- 10.1109/tmc.2020.3003639
- Jun 22, 2020
- IEEE Transactions on Mobile Computing
In this paper, we propose a reinforcement learning approach of collision avoidance and investigate optimal trajectory planning for unmanned aerial vehicle (UAV) communication networks. Specifically, each UAV takes charge of delivering objects in the forward path and collecting data from heterogeneous ground IoT devices in the backward path. We adopt reinforcement learning for assisting UAVs to learn collision avoidance without knowing the trajectories of other UAVs in advance. In addition, for each UAV, we use optimization theory to find out a shortest backward path that assures data collection from all associated IoT devices. To obtain an optimal visiting order for IoT devices, we formulate and solve a no-return traveling salesman problem. Given a visiting order, we formulate and solve a sequence of convex optimization problems to obtain line segments of an optimal backward path for heterogeneous ground IoT devices. We use analytical results and simulation results to justify the usage of the proposed approach. Simulation results show that the proposed approach is superior to a number of alternative approaches.
- Research Article
18
- 10.3390/electronics9081185
- Jul 23, 2020
- Electronics
With the rapid development of information technology and the increasing application of UAV in various fields, the security problems of unmanned aerial vehicle (UAV) communication network have become increasingly prominent. It has become an important scientific challenge to design a routing protocol that can provide efficient and reliable node to node packet transmission. In this paper, an efficient Digital Signature algorithm based on the elliptic curve cryptosystem is applied to routing protocol, and an improved security method suitable for on-demand routing protocol is proposed. The UAV communication network was simulated through the NS2 simulation platform, and the execution efficiency and safety of the improved routing protocol were analyzed. In the simulation experiment, the routing protocols of ad-hoc on demand distance vector (AODV), security ad-hoc on demand distance vector (SAODV), and improved security ad-hoc on demand distance vector (ISAODV) are compared in terms of the performance indicators of packet delivery rate, throughput, and end-to-end delay under normal conditions and when attacked by malicious nodes. The simulation results show that the improved routing protocol can effectively improve the security of the UAV communication network.
- New
- Research Article
- 10.1038/s41598-025-24936-2
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-05663-0
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-25573-5
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-25911-7
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-26478-z
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-26058-1
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-25891-8
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-26168-w
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-25690-1
- Nov 7, 2025
- Scientific reports
- New
- Research Article
- 10.1038/s41598-025-23455-4
- Nov 7, 2025
- Scientific reports
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.