The current research into decision-making strategies for air combat focuses on the performance of algorithms, while the selection of actions is often ignored, and the actions are often fixed in amplitude and limited in number in order to improve the convergence efficiency, making the strategy unable to give full play to the maneuverability of the aircraft. In this paper, a decision-making strategy for close-range air combat based on reinforcement learning with variable-scale actions is proposed; the actions are the variable-scale virtual pursuit angles and speeds. Firstly, a trajectory prediction method consisting of a real-time prediction, correction, and judgment of errors is proposed. The back propagation (BP) neural network and the long and short term memory (LSTM) neural network are used as base prediction network and correction prediction network, respectively. Secondly, the past, current, and future positions of the target aircraft are used as virtual pursuit points, and they are converted into virtual pursuit angles as the track angle commands using angle guidance law. Then, the proximity policy optimization (PPO) algorithm is applied to train the agent. The simulation results show that the attacking aircraft that uses the strategy proposed in this paper has a higher win rate during air combat and the attacking aircraft’s maneuverability is fully utilized.
Read full abstract