Abstract

A midcourse maneuver controller is obtained using deep reinforcement learning to maintain the survivability of a ballistic missile. First, the midcourse is abstracted as a Markov decision process (MDP) with an unknown system state equation. Then, a controller formed by the Dueling Double Deep Q (D3Q) neural network is used to approximate the state-action value function of the MDP. In order to make the controller’s intelligence improved by deep reinforcement learning, the state space, action space, and instant reward function of the MDP are customized. The controller uses a real-time situation as input and outputs the ignition states of pulse motors. Offline training shows that deep reinforcement learning can achieve the optimal strategy’s convergence after approximately 65 hours. Online tests demonstrate the controller’s ability to avoid an interceptor intelligently and to account for an entry error. In scenarios with multiple random factors, the controller achieved a penetration probability of 100% and a mean re-entry error of less than 5000 m.

Highlights

  • Ballistic missiles have a long flight time in midcourse and a fixed trajectory

  • To eliminate the reentry error caused by the midcourse maneuver, Reference [8] used remaining pulse motors to regress a preset ballistic

  • A rapid trajectory optimization algorithm was proposed for the whole course under the condition of multiple constraints and multiple detection zones

Read more

Summary

INTRODUCTION

Ballistic missiles have a long flight time in midcourse and a fixed trajectory. various countries regard midcourse interception as the core strategy for missile defense systems [1]~[3]. Reference [7] proposed a midcourse penetration strategy using an axial impulse maneuver and provided a detailed trajectory design method. This penetration strategy does not require lateral pulse motors. The concept is to design a trajectory, before launch, that can evade the enemy's detection zone The solving of this ballistic problem is a complex nonlinear programming problem with multiple constraints and multiple stages. According to the accurate model of a penetration spacecraft and an interceptor, Reference [12] proposed a guidance law using a statedependent Riccati equation (SDRE) This approach obtained superior combat effectiveness when compared with classic differential game theory.

PROBLEM FORMULATION
ANALYSIS OF THE MIDCOURSE PENETRATION
MARKOV DECISION PROCESS WITH SYSTEM
TRAINING ALGORITHM
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call