Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

Weiren Kong,Lina Zeng,Kai Zhang,Zhen Yang,Deyun Zhou

doi:10.3390/app10155198

Abstract

With the development of unmanned combat air vehicles (UCAVs) and artificial intelligence (AI), within visual range (WVR) air combat confrontations utilizing intelligent UCAVs are expected to be widely used in future air combats. As controlling highly dynamic and uncertain WVR air combats from the ground stations of the UCAV is not feasible, it is necessary to develop an algorithm that can generate highly intelligent air combat strategies in order to enable UCAV to independently complete air combat missions. In this paper, a 1-vs.-1 WVR air combat strategy generation algorithm is proposed using the multi-agent deep deterministic policy gradient (MADDPG). A 1-vs.-1 WVR air combat is modeled as a two-player zero-sum Markov game (ZSMG). A method for predicting the position of the target is introduced into the model in order to enable the UCAV to predict the target’s actions and position. Moreover, to ensure that the UCAV is not limited by the constraints of the basic fighter maneuver (BFM) library, the action space is considered to be a continuous one. At the same time, a potential-based reward shaping method is proposed in order to improve the efficiency of the air combat strategy generation algorithm. Finally, the efficiency of the air combat strategy generation algorithm and the intelligence level of the resulting strategy is verified through simulation experiments. The results show that an air combat strategy using target position prediction is superior to the one that does not use target position prediction.

Highlights

With the development of unmanned combat air vehicles (UCAVs), the role of UCAVs is becoming increasingly significant in the field of combat [1]
We propose a UCAV 1-vs.-1 within visual range (WVR) air combat strategy generation algorithm based on multi-agent deep deterministic policy gradient (MADDPG)
The intercept time is measured from the beginning of the air combat simulation until a UCAV has established a position of advantage

Summary

Introduction

With the development of unmanned combat air vehicles (UCAVs), the role of UCAVs is becoming increasingly significant in the field of combat [1]. In the vast majority of articles, the author only uses the current UCAV motion state as the state space for reinforcement learning, which makes it difficult for a trained UCAV air combat strategy to learn predicted target maneuvering and intentional air combat decisions. This means that an air combat strategy will not consist of the intelligent behavior necessary to be in the dominant position in advance. In order to solve the above mentioned problems, we propose a UCAV 1-vs.-1 WVR air combat strategy generation method that is based on a multi-agent reinforcement learning method with the inclusion of a target maneuver prediction in this article. Introducing potential-based reward shaping method to improve the efficiency of maneuver strategy generation algorithm of UCAV

Zero-Sum Markov Games

Air Combat Reward Function and Termination Condition Designing

Prediction Interval Estimation

Target Position Prediction

Maneuvering Strategy Generation Algorithm Outline

Reward Shaping

Reward Shaping for Orientation

Reward Shaping for Distance

Reward Shaping for Velocity

Prioritized Replay Memory

Air Combat Simulation Platform Construction

Maneuvering Strategy Generation Algorithm Parameters Setting

UCAV Performance Parameters Setting

Evaluation Metrics of Air Combat Strategy

Comparative Analysis of Training Process

Evaluation Metrics

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jul 28, 2020
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Air Combat Analysis Using NADM Computation in WVR Case
P.S Prasetyo Ardi ... Rianto A Sasongko
-
P.S Prasetyo Ardi, et. al.P.S Prasetyo Ardi ... Rianto A Sasongko
01 Nov 2017
01 Nov 2017

Real-time Calculation of Tactical Control Range in Beyond Visual Range Air Combat
Weinan Gao ... Haiyin Piao
-
Weinan Gao, et. al.Weinan Gao ... Haiyin Piao
28 Oct 2022
28 Oct 2022

Machine Learning to Improve Situational Awareness in Beyond Visual Range Air Combat
Joao P A Dantas ... Andre N Costa
IEEE Latin America Transactions | VOL. 20
Joao P A Dantas, et. al.Joao P A Dantas ... Andre N Costa
01 Aug 2022
IEEE Latin America Transactions | VOL. 20

A novel decision-making algorithm for beyond visual range air combat based on deep reinforcement learning
Yang Jiang ... Qingdong Li
-
Yang Jiang, et. al.Yang Jiang ... Qingdong Li
19 Nov 2022
19 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences