Optimal UAV Base Station Trajectories Using Flow-Level Models for Reinforcement Learning

Vidit Saxena,Henrik Klessig,Joakim Jalden

doi:10.1109/tccn.2019.2948324

Abstract

Cellular base stations (BS) and remote radio heads can be mounted on unmanned aerial vehicles (UAV) for flexible, traffic-aware deployment. These UAV base station networks (UAVBSN) promise an unprecendented degree of freedom that can be exploited for spectral efficiency gains as well as optimal network utilization. However, the current literature lacks realistic radio and traffic models for UAVBSN deployment planning and for performance evaluation. In this paper, we propose flow-level models (FLM) for realistically characterizing the UAVBSN performance in terms of a broad range of flow- and system-level metrics. Further, we propose a deep reinforcement learning (DRL) approach that relies on the UAVBSN FLM for learning the optimal traffic-aware UAV trajectories. For a given user traffic density and starting UAV locations, our RL approach learns the optimal UAV trajectories offline that maximizes a cumulative performance metric. We then execute the learned UAV trajectories in a discrete event simulator to evaluate online UAVBSN performance. For ${M}\pmb {=}$ 9 UAVs deployed in a simulated Downtown San Francisco model, where the UAV trajectories are defined by ${N}\pmb {=}$ 20 discrete actions, our approach achieves approximately a three-fold increase in the average user throughput compared to the initial UAV placement, while simultaneously balancing traffic loads across the BSs.

Highlights

Due to recent advances in unmanned aerial vehicle (UAV) technology and battery lifetime, novel drone-based applications emerge [1]
The agent should not react to individual data flow arrivals or departures, because, on the one hand, randomness in the process can produce atypical flow constellations leading to sub-optimal UAV base stations (BS) trajectories and, on the other hand, data flow dynamics may happen on much smaller time scales than UAV movements making the latter inefficient
Our proposed deep reinforcement learning (DRL) approach for learning the trajectory optimization policy satisfies our learning goals of being fast, efficient and robust: (i) We directly learn the trajectory optimization policy through stochastic gradient updates of the policy parameters, which results in fast convergence to optimum, (ii) During the random exploration phase of the policy training, we use the flowlevel models (FLM) model to estimate the rewards for randomly selected actions

Summary

INTRODUCTION

Due to recent advances in unmanned aerial vehicle (UAV) technology and battery lifetime, novel drone-based applications emerge [1]. Complementary or alternatively to small cells, UAV BS networks (UAVBSN) seem to be an attractive solution to time- and spatially-varying traffic demands, where network utilization is often either very low, when too many BSs are serving too few users, or extremely high, when there are far more users than planned at the time of deployment. Motivated by the complexity of the UAV trajectory planning problem, in this article, we use DRL in combination with flowlevel models (FLMs) so solve the complex problem of finding optimal UAV trajectories for improved spectral efficiency and, at the same time, for more balanced radio utilizations across the UAV BSs. We use FLMs as a realistic model for the network performance in response to traffic variations to help the DRL learn the relationships between the environment and key performance indicators. We show that this approach achieves a considerable improvement of the average data flow throughput in comparison to the initial deployment that is determined according to circle packing theory

Organization of the Article

RELATED WORK

Limitations of Existing Approaches

Contributions of this Article

SYSTEM MODEL

UAV Base Station Network

Data Traffic Model

Service of a Data Flow

Radio Propagation and Signal Quality

Service Dynamics

Requirements on the Optimization Approach

Problem Formulation for the Learning Agent

Flow-Level Model for UAVBSN

Learning the Optimal Policy

NUMERICAL RESULTS

Evaluation Framework

Simulation Results

Discussion

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Cognitive Communications and Networking	Publication Date: Dec 1, 2019
Citations: 105	License type: other-oa

R Discovery Prime

R Discovery Prime

Optimal UAV Base Station Trajectories Using Flow-Level Models for Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Cognitive Communications and Networking

Lead the way for us

Similar Papers

Optimal 1D Trajectory Design for UAV-Enabled Multiuser Wireless Power Transfer
Yulin Hu ... Xiaopeng Yuan
IEEE Transactions on Communications | VOL. 67
Yulin Hu, et. al.Yulin Hu ... Xiaopeng Yuan
01 Aug 2019
IEEE Transactions on Communications | VOL. 67

Capacity Characterization of UAV-Enabled Two-User Broadcast Channel
Qingqing Wu ... Rui Zhang
IEEE Journal on Selected Areas in Communications | VOL. 36
Qingqing Wu, et. al.Qingqing Wu ... Rui Zhang
01 Sep 2018
IEEE Journal on Selected Areas in Communications | VOL. 36

Joint Optimization of Access and Backhaul Links for UAVs Based on Reinforcement Learning
Azade Fotouhi ... Lorenzo Galati Giordano
-
Azade Fotouhi, et. al.Azade Fotouhi ... Lorenzo Galati Giordano
01 Dec 2019
01 Dec 2019

Energy‐efficient optimisation for UAV‐aided wireless sensor networks
Meng Hua ... Zhengming Zhang
IET Communications | VOL. 13
Meng Hua, et. al.Meng Hua ... Zhengming Zhang
01 May 2019
IET Communications | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal UAV Base Station Trajectories Using Flow-Level Models for Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Cognitive Communications and Networking