UAV air combat autonomous trajectory planning method based on robust adversarial reinforcement learning

Lixin Wang,Sizhuang Zheng,Shang Tai,Hailiang Liu,Ting Yue

doi:10.1016/j.ast.2024.109402

Abstract

The poor robustness of the air combat autonomous trajectory planning strategy (ATP) trained through vanilla reinforcement learning (RL) methods is attributed to its dependence on the source environment. When this ATP is transformed from the source to the target environment, it may fail to lead to an effective dogfight victory due to disturbances, which can potentially threaten the flight safety of the unmanned combat aerial vehicle (UCAV). In order to enhance the robustness of ATP, we propose a robust RL-based method for generating an ATP strategy. This method can effectively consider uncertainties in the state and potential model misspecifications. First, a jointly trained adversary is designed to apply disturbances in the environment, forming a two-player zero-sum game with the ATP agent. To solve this game problem and learn a robust ATP, the robust adversarial RL method (RARL) is applied. Additionally, in order to prevent non-convergence of strategies resulting from the introduction of the RARL algorithm, a curriculum learning method is proposed that emulates the way human pilots learn, gradually progressing from simpler to more challenging courses. Finally, a quantitative method for evaluating the robustness of the ATP is proposed, taking into consideration the differences in ATP performance between the source and target combat environments. The results of robustness estimations demonstrate that the RARL algorithm, which is proposed herein, effectively enhances the ATP's robustness.

Full Text