Abstract

A novel homing guidance law against maneuvering targets based on the deep deterministic policy gradient (DDPG) is proposed. The proposed guidance law directly maps the engagement state information to the acceleration of the interceptor, which is an end-to-end guidance policy. Firstly, the kinematic model of the interception process is described as a Markov decision process (MDP) that is applied to the deep reinforcement learning (DRL) algorithm. Then, an environment of training, state, action, and network structure is reasonably designed. Only the measurements of line-of-sight (LOS) angles and LOS rotational rates are used as state inputs, which can greatly simplify the problem of state estimation. Then, considering the LOS rotational rate and zero-effort-miss (ZEM), the Gaussian reward and terminal reward are designed to build a complete training and testing simulation environment. DDPG is used to deal with the RL problem to obtain a guidance law. Finally, the proposed RL guidance law’s performance has been validated using numerical simulation examples. The proposed RL guidance law demonstrated improved performance compared to the classical true proportional navigation (TPN) method and the RL guidance policy using deep-Q-network (DQN).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call