Deep Recurrent Reinforcement Learning for Intercept Guidance Law under Partial Observability

Xu Wang,Haonan Jiang,Yuanli Cai,Yifan Deng

doi:10.1080/08839514.2024.2355023

Xu Wang, Haonan Jiang + Show 2 more

Open Access

https://doi.org/10.1080/08839514.2024.2355023

Copy DOI

Abstract

ABSTRACT Nowadays, the rapid development of hypersonic vehicles brings great challenges to the missile defense system. As achieving successful interception depends highly on terminal guidance laws, research on guidance laws for intercepting highly maneuvering targets has aroused increasing attention. Artificial intelligence technologies, such as deep reinforcement learning (DRL), have been widely applied to improve the performance of guidance laws. However, the existing DRL guidance laws rarely consider the partial observability problem of onboard sensors, resulting in the limitations of their engineering applications. In this paper, a deep recurrent reinforcement learning (DRRL)-based guidance method is investigated to address the intercept guidance problem against maneuvering targets under partial observability. The sequence consisting of previous state observations is utilized as the input of the policy network. A recurrent layer is introduced into the networks to extract hidden information behind the temporal sequence to support policy training. The guidance problem is formulated as a partially observable Markov decision process model, and then a range-weighted reward function that considers the line-of-sight rate and energy consumption is designed to guarantee convergence of policy training. The effectiveness of the proposed DRRL guidance law is validated by extensive numerical simulations.

Full Text