A hierarchical reinforcement learning method for missile evasion and guidance

Mengda Yan,Dongyuan Hu,Ying Zhang,Rennong Yang,Longfei Yue

doi:10.1038/s41598-022-21756-6

Mengda Yan, Dongyuan Hu + Show 3 more

Open Access

https://doi.org/10.1038/s41598-022-21756-6

Copy DOI

Journal: Scientific Reports	Publication Date: Nov 7, 2022
Citations: 4	License type: open-access

Affiliation: Air Force Engineering University

Abstract

This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption.

Full Text