Abstract

Reinforcement Learning (RL) algorithms enhance intelligence of air combat Autonomous Maneuver Decision (AMD) policy, but they may underperform in target combat environments with disturbances. To enhance the robustness of the AMD strategy learned by RL, this study proposes a Tube-based Robust RL (TRRL) method. First, this study introduces a tube to describe reachable trajectories under disturbances, formulates a method for calculating tubes based on sum-of-squares programming, and proposes the TRRL algorithm that enhances robustness by utilizing tube size as a quantitative indicator. Second, this study introduces offline techniques for regressing the tube size function and establishing a tube library before policy learning, aiming to eliminate complex online tube solving and reduce the computational burden during training. Furthermore, an analysis of the tube library demonstrates that the mitigated AMD strategy achieves greater robustness, as smaller tube sizes correspond to more cautious actions. This finding highlights that TRRL enhances robustness by promoting a conservative policy. To effectively balance aggressiveness and robustness, the proposed TRRL algorithm introduces a “laziness factor” as a weight of robustness. Finally, combat simulations in an environment with disturbances confirm that the AMD policy learned by the TRRL algorithm exhibits superior air combat performance compared to selected robust RL baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.