Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.

Weiwei Zhao,Hairong Chu,Dongxin Liang,Feng Zhang,Xikui Miao,Lihong Guo,Honghai Shen,Chenhao Zhu

doi:10.3390/s20164546

Abstract

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.

Highlights

The autonomy and intelligent development of the coordinated control of multi-agent systems such as multi-unmanned aerial vehicle (UAV) and multi-robot have received more and more attention.To solve the problem of coordinated control and obstacle avoidance of multiagent systems, researchersSensors 2020, 20, 4546; doi:10.3390/s20164546 www.mdpi.com/journal/sensorsSensors 2020, 20, 4546 have proposed various solutions, including rule-based methods, field methods, geometric methods, numerical optimization methods, and so on [1,2].In recent years, the application and development of reinforcement learning (RL) in the field of robotics has attracted more and more attention
This paper proposes the Multiagent Joint Proximal Policy Optimization (MAJPPO) algorithm, which uses the moving window average of the state–value functions of different agents to get the centralized state–value function to solve the problem of multi-UAV cooperative control
The main contributions of the paper are as follows: The development of the multiagent joint proximal policy optimization (MAJPPO) algorithm; and, The multiagent reinforcement learning (MARL) algorithm is applied to the multi-UAV formation and obstacle avoidance field

Summary

Introduction

The autonomy and intelligent development of the coordinated control of multi-agent systems such as multi-unmanned aerial vehicle (UAV) and multi-robot have received more and more attention. This paper proposes the Multiagent Joint Proximal Policy Optimization (MAJPPO) algorithm, which uses the moving window average of the state–value functions of different agents to get the centralized state–value function to solve the problem of multi-UAV cooperative control. The centralization value function of the MAJPPO algorithm does not require the policies of collaborative agents during training, thereby reducing the complexity of the algorithm. The main contributions of the paper are as follows: The development of the MAJPPO algorithm; and, The MARL algorithm is applied to the multi-UAV formation and obstacle avoidance field.

Background and Preliminary

PPO Algorithm

Multiagent

Multiagent Joint PPO Algorithm

Dynamics Model of Small UAV and Attitude Control

RL of Single UAV

Multi‐UAV

Multi-UAV Formation

Network Settings

Parameter Settings

Mission Environment

Experimental Comparison and Analysis

Parameter Evaluation

Parameter

Conclusions and Future

Findings

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Aug 13, 2020
Citations: 20	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework
Guang Zhan ... Deyun Zhou
Drones | VOL. 6
Guang Zhan, et. al.Guang Zhan ... Deyun Zhou
04 Jul 2022
Drones | VOL. 6

Implementing action mask in proximal policy optimization (PPO) algorithm
Cheng-Yen Tang ... Chien-Hung Liu
ICT Express | VOL. 6
Cheng-Yen Tang, et. al.Cheng-Yen Tang ... Chien-Hung Liu
20 May 2020
ICT Express | VOL. 6

UAV path planning based on the improved PPO algorithm
Chenyang Qi ... Chengfu Wu
-
Chenyang Qi, et. al.Chenyang Qi ... Chengfu Wu
01 Aug 2022
01 Aug 2022

Application of Deep Reinforcement Learning in Guandan Game
Jiahong Pan ... Hengheng Shen
-
Jiahong Pan, et. al.Jiahong Pan ... Hengheng Shen
15 Aug 2022
15 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)