In air combat missions, unmanned combat aerial vehicles (UCAVs) must take strategic actions to establish combat advantages, enabling effective tracking and attacking of enemy UCAVs. Currently, a lot of reinforcement learning algorithms are applied to the air combat mission of unmanned fighter aircraft. However, most algorithms can only select policies based on the current state of both sides. This leads to the inability to effectively track and attack when the enemy performs large angle maneuvering. Additionally, these algorithms cannot adapt to different situations, resulting in the unmanned fighter aircraft being at a disadvantage in some cases. To solve these problems, this paper proposes predictive air combat decision model with segmented reward allocation for air combat tracking and attacking. On the basis of the air combat environment, we propose the prediction soft actor-critic (Pre-SAC) algorithm, which combines the prediction of enemy states with the states of UCAV for model training. This enables the UCAV to predict the next move of the enemy UCAV in advance and establish a greater air combat advantage for us. Furthermore, by adopting a segmented reward allocation model and combining it with the Pre-SAC algorithm, we propose the segmented reward allocation soft actor-critic (Sra-SAC) algorithm, which solves the problem of UCAVs being unable to adapt to different situations. The results show that the prediction-based segmented reward allocation the Sra-SAC algorithm outperforms the traditional soft actor-critic (SAC) algorithm in terms of overall reward, travel distance, and relative advantage.
Read full abstract