Sequence-to-Sequence Multi-Agent Reinforcement Learning for Multi-UAV Task Planning in 3D Dynamic Environment

Ziwei Liu,Zhiyong Zhang,Changzhen Qiu

doi:10.3390/app122312181

Ziwei Liu, Zhiyong Zhang + Show 1 more

Open Access

https://doi.org/10.3390/app122312181

Copy DOI

Journal: Applied sciences	Publication Date: Nov 28, 2022
Citations: 3	License type: CC BY 4.0

Affiliation: Sun Yat-sen University

Abstract

Task planning involving multiple unmanned aerial vehicles (UAVs) is one of the main research topics in the field of cooperative unmanned aerial vehicle control systems. This is a complex optimization problem where task allocation and path planning are dealt with separately. However, the recalculation of optimal results is too slow for real-time operations in dynamic environments due to a large amount of computation required, and traditional algorithms are difficult to handle scenarios of varying scales. Meanwhile, the traditional approach confines task planning to a 2D environment, which deviates from the real world. In this paper, we design a 3D dynamic environment and propose a method for task planning based on sequence-to-sequence multi-agent deep deterministic policy gradient (SMADDPG) algorithm. First, we construct the task-planning problem as a multi-agent system based on the Markov decision process. Then, the DDPG is combined sequence-to-sequence to learn the system to solve task assignment and path planning simultaneously according to the corresponding reward function. We compare our approach with the traditional reinforcement learning algorithm in this system. The simulation results show that our approach satisfies the task-planning requirements and can accomplish tasks more efficiently in competitive as well as cooperative scenarios with dynamic or constant scales.

Full Text