PRD-MADDPG: An efficient learning-based algorithm for orbital pursuit-evasion game with impulsive maneuvers

Liran Zhao,Yulin Zhang,Zhaohui Dang

doi:10.1016/j.asr.2023.03.014

Abstract

This paper comprehensively investigates the problem of impulsive orbital pursuit-evasion games (OPEGs) by using an artificial intelligence-based approach. First, the mathematical model for the impulsive OPEGs in which the pursuer and evader both perform their orbital maneuvers by imposing the impulsive velocity increments is constructed. Second, the problem of impulsive OPEGs is transformed into a bilateral optimization problem with a minimum–maximum optimization index in terms of terminal time and multiple constraints such as maneuverability, total fuel consumption, and mission time, etc. To determine the optimal impulsive maneuvers for both sides, a PRD-MADDPG (Predict-Reward-Detect Multi-Agent Deep Deterministic Policy Gradient) algorithm in the frame of multi-agent reinforcement learning is designed. This novel algorithm uses the basic MADDPG to achieve the strategies training and learning, and applies the supplemental PRD to predict the change of game state during the interval between two adjacent impulsive maneuvers and incorporate these information into the algorithm training in the form of predicted reward. Finally, some pursuit-evasion missions near the Geosynchronous Earth Orbit are numerically analyzed to verify the validness and effectiveness of the algorithm. The results prove that the PRD-MADDPG algorithm is very efficient to find applicable strategies even considering rather complex constraints. It is also shown that the learning-based strategies can be effectively applied in the extended scenarios which are not seen in the training process.

Full Text