With the improvement of UAV performance and intelligence in recent years, it is particularly important for unmanned aerial vehicles (UAVs) to improve the ability of autonomous air combat. Aiming to solve the problem of how to improve the autonomous air combat maneuver decision ability of UAVs so that it can be close to manual manipulation, this paper proposes an autonomous air combat maneuvering decision method based on the combination of simulated operation command and the final reward value deep deterministic policy gradient (FRV-DDPG) algorithm. Firstly, the six-degree-of-freedom (6-DOF) model is established based on the air combat process, UAV motion, and missile motion. Secondly, a prediction method based on the Particle swarm optimization radial basis function (PSO-RBF) is designed to simulate the operation command of the enemy aircraft, which makes the training process more realistic, and then an improved DDPG strategy is proposed, which returns the final reward value to the previous reward value in a certain proportion of time for offline training, which can improve the convergence speed of the algorithm. Finally, the effectiveness of the algorithm is verified by building a simulation environment. The simulation results show that the algorithm can improve the autonomous air combat maneuver decision-making ability of UAVs.
Read full abstract