Optimal UAV Base Station Trajectories Using Flow-Level Models for Reinforcement Learning

Vidit Saxena,Joakim Jalden,Henrik Klessig

doi:10.1109/tccn.2019.2948324

Vidit Saxena, Joakim Jalden + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/tccn.2019.2948324

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Cellular base stations (BS) and remote radio heads can be mounted on unmanned aerial vehicles (UAV) for flexible, traffic-aware deployment. These UAV base station networks (UAVBSN) promise an unprecendented degree of freedom that can be exploited for spectral efficiency gains as well as optimal network utilization. However, the current literature lacks realistic radio and traffic models for UAVBSN deployment planning and for performance evaluation. In this paper, we propose flow-level models (FLM) for realistically characterizing the UAVBSN performance in terms of a broad range of flow- and system-level metrics. Further, we propose a deep reinforcement learning (DRL) approach that relies on the UAVBSN FLM for learning the optimal traffic-aware UAV trajectories. For a given user traffic density and starting UAV locations, our RL approach learns the optimal UAV trajectories offline that maximizes a cumulative performance metric. We then execute the learned UAV trajectories in a discrete event simulator to evaluate online UAVBSN performance. For ${M}\pmb {=}$ 9 UAVs deployed in a simulated Downtown San Francisco model, where the UAV trajectories are defined by ${N}\pmb {=}$ 20 discrete actions, our approach achieves approximately a three-fold increase in the average user throughput compared to the initial UAV placement, while simultaneously balancing traffic loads across the BSs.

Highlights

Due to recent advances in unmanned aerial vehicle (UAV) technology and battery lifetime, novel drone-based applications emerge [1]
The agent should not react to individual data flow arrivals or departures, because, on the one hand, randomness in the process can produce atypical flow constellations leading to sub-optimal UAV base stations (BS) trajectories and, on the other hand, data flow dynamics may happen on much smaller time scales than UAV movements making the latter inefficient
Our proposed deep reinforcement learning (DRL) approach for learning the trajectory optimization policy satisfies our learning goals of being fast, efficient and robust: (i) We directly learn the trajectory optimization policy through stochastic gradient updates of the policy parameters, which results in fast convergence to optimum, (ii) During the random exploration phase of the policy training, we use the flowlevel models (FLM) model to estimate the rewards for randomly selected actions

Summary

INTRODUCTION

Due to recent advances in unmanned aerial vehicle (UAV) technology and battery lifetime, novel drone-based applications emerge [1]. Complementary or alternatively to small cells, UAV BS networks (UAVBSN) seem to be an attractive solution to time- and spatially-varying traffic demands, where network utilization is often either very low, when too many BSs are serving too few users, or extremely high, when there are far more users than planned at the time of deployment. Motivated by the complexity of the UAV trajectory planning problem, in this article, we use DRL in combination with flowlevel models (FLMs) so solve the complex problem of finding optimal UAV trajectories for improved spectral efficiency and, at the same time, for more balanced radio utilizations across the UAV BSs. We use FLMs as a realistic model for the network performance in response to traffic variations to help the DRL learn the relationships between the environment and key performance indicators. We show that this approach achieves a considerable improvement of the average data flow throughput in comparison to the initial deployment that is determined according to circle packing theory

Organization of the Article

RELATED WORK

Limitations of Existing Approaches

Contributions of this Article

SYSTEM MODEL

UAV Base Station Network

Data Traffic Model

Service of a Data Flow

Radio Propagation and Signal Quality

Service Dynamics

Requirements on the Optimization Approach

Problem Formulation for the Learning Agent