Abstract

In unmanned aerial vehicle (UAV) applications, the UAV's limited energy supply and storage have triggered the development of intelligent energy-conserving scheduling solutions. In this paper, we investigate energy minimization for UAV-aided communication networks by jointly optimizing data-transmission scheduling and UAV hovering time. The formulated problem is combinatorial and non-convex with bilinear constraints. To tackle the problem, firstly, we provide an optimal algorithm (OPT) and a golden section search heuristic algorithm (GSS-HEU). Both solutions are served as offline performance benchmarks which might not be suitable for online operations. Towards this end, from a deep reinforcement learning (DRL) perspective, we propose an actor-critic-based deep stochastic online scheduling (AC-DSOS) algorithm and develop a set of approaches to confine the action space. Compared to conventional RL/DRL, the novelty of AC-DSOS lies in handling two major issues, i.e., exponentially-increased action space and infeasible actions. Numerical results show that AC-DSOS is able to provide feasible solutions, and save around 25-30% energy compared to two conventional deep AC-DRL algorithms. Compared to the developed GSS-HEU, AC-DSOS consumes around 10% higher energy but reduces the computational time from second-level to millisecond-level.

Highlights

  • Unmanned aerial vehicles (UAVs) have attracted much attention to high-speed data transmission in dynamic, distributed, or plug-and-play scenarios, e.g., disaster rescue, live concert, or sports events [1]

  • We propose an actor-critic-based deep stochastic online scheduling (AC-DSOS) algorithm for UAV energy savings, where the original problem is transformed into a Markov decision process (MDP)

  • Unlike conventional deep reinforcement learning (DRL), we develop a set of tailored approaches in AC-DSOS, e.g., stochastic policy quantification, action space reduction, and feasibility-guaranteed reward function design, to overcome DRL’s limitations in addressing combinatorial optimization problems with multiple constraints and large action space

Read more

Summary

INTRODUCTION

Deterministic optimization algorithms, e.g., [2]–[9], might not be suitable for fast decision making in a dynamic wireless environment To address this issue, deep learning-based solutions have been investigated in the literature. In [19], the authors employed deep actor-critic to design a learning algorithm for UAV-aided systems, considering energy efficiency and users’ fairness. Compared to offline optimization approaches, we provide online learning and timely energy-saving solutions based on DRL. We propose an actor-critic-based deep stochastic online scheduling (AC-DSOS) algorithm for UAV energy savings, where the original problem is transformed into a Markov decision process (MDP). Unlike conventional DRL, we develop a set of tailored approaches in AC-DSOS, e.g., stochastic policy quantification, action space reduction, and feasibility-guaranteed reward function design, to overcome DRL’s limitations in addressing combinatorial optimization problems with multiple constraints and large action space. The codes for generating the results are online available at the link: https://github.com/ArthuretYuan

System Model
UAV’s Energy Model
PROBLEM FORMULATION
User-Timeslot Scheduling
Hovering Time Allocation
Algorithm Summary
Problem Reformulation
The AC-DSOS algorithm
NUMERICAL RESULTS
Parameter Settings
Results and Analysis
CONCLUSION
Objective
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call