Abstract

This paper proposes a new reinforcement learning approach for executing combat unmanned aerial vehicle (CUAV) missions. We consider missions with the following goals: guided missile avoidance, shortest-path flight and formation flight. For reinforcement learning, the representation of the current agent state is important. We propose a novel method of using the coordinates and angle of a CUAV to effectively represent its state. Furthermore, we develop a reinforcement learning algorithm with enhanced exploration through amplification of the imitation effect (AIE). This algorithm consists of self-imitation learning and random network distillation algorithms. We assert that these two algorithms complement each other and that combining them amplifies the imitation effect for exploration. Empirical results show that the proposed AIE approach is highly effective at finding a CUAV’s shortest-flight path while avoiding enemy missiles. Test results confirm that with our method, a single CUAV reaches its target from its starting point 95% of the time and a squadron of four simultaneously operating CUAVs reaches the target 70% of the time.

Highlights

  • Combat unmanned aerial vehicles (CUAVs) will be an important resource in future military systems because they can replace humans in performing dangerous or important tasks

  • The results show that AIE2 and AIE3 succeeded in converging to the desired policy, while A2C-based SIL (ASIL) and AIE1 fell into a local minimum in one in two trials and one in three trials, respectively

  • We have proposed an reinforcement learning (RL) algorithm for guiding a CUAV to achieve multiple goals through actions in a manner similar to human behavior

Read more

Summary

INTRODUCTION

Combat unmanned aerial vehicles (CUAVs) will be an important resource in future military systems because they can replace humans in performing dangerous or important tasks. It is anticipated that CUAVs will have the ability to determine reasonable actions by recognizing and evaluating changes in a military environment, such as enemy surface-to-air threats, in real time and without human intervention. Such CUAVs will be able to carry out tasks such as reconnaissance and target attacks [1]. The intrinsic reward approach is efficient for exploration because the network that predicts the state can drive the agent to behave differently from its previous action.

RELATED WORK
CATASTROPHIC FORGETTING IN RND
PROPOSED ALGORITHMS FOR THE CUAV ENVIRONMENT
15: Calculate the intrinsic reward it
HARD EXPLORATION IN A 2D ENVIRONMENT
EXPERIMENTS ON CUAV MISSION EXECUTION
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call