Abstract

The ultimate goal of military intelligence is to equip the command and control (C2) system with the decision-making art of excellent human commanders and to be more agile and stable than human beings. Intelligent commander Alpha C2 solves the dynamic decision-making problem in the complex scenarios of air defense operations using a deep reinforcement learning framework. Unlike traditional C2 systems that rely on expert rules and decision-making models, Alpha C2 interacts with digital battlefields close to the real world and generates learning data. By integrating the states of multiple parties as input, a gated recurrent unit network is used to introduce historical information, and an attention mechanism selects the object of action, making the output decision more reliable. Without learning human combat experience, the neural network is trained in fixed- and random-strategy scenarios based on a proximal policy optimization algorithm. Finally, 1,000 rounds of offline confrontation were conducted on a digital battlefield, whose results show that the generalization ability of Alpha C2 trained using a random strategy is better, and that it can defeat an opponent with a higher winning rate than an Expert C2 system (72% vs 21%). The use of resources is more reasonable than Expert C2, reflecting the flexible and changeable art of command.

Highlights

  • Battle field is full of antagonism, complexity and uncertainty [1]

  • Alpha C2 interacted with the digital battlefield online to generate learning data

  • DIGITAL BATTLEFIELD ENVIRONMENT Intelligent agents must interact with the environment in the training process, and this is the main restriction on the development of military intelligence

Read more

Summary

INTRODUCTION

Battle field is full of antagonism, complexity and uncertainty [1]. How to deal with uncertainty is significant to decision making in military operation. Q. Fu et al.: Alpha C2—An Intelligent Air Defense Commander Independent of Human Decision-Making of air combat resources and avoid missing key targets and repeated shooting. Fu et al.: Alpha C2—An Intelligent Air Defense Commander Independent of Human Decision-Making of air combat resources and avoid missing key targets and repeated shooting This problem has been proved to be NPcomplete [19] and can be divided into the categories of static WTA [20] and shoot-look-shoot-based dynamic WTA [21]. Alpha C2 interacted with the digital battlefield online to generate learning data After the attackable blue party target state is processed through the Max-pool, the target (object of the action) to be attacked is selected using the attention mechanism

GATED RECURRENT UNIT
ATTENTION MECHANISM
ASYNCHRONOUS PROXIMAL POLICY OPTIMIZATION
RED PARTY FORCE SETTINGS AND CAPABILITY INDICATORS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call