As the complexity of space missions increases, the constraints on satellite attitude control become more stringent, particularly for satellites working in orbit formation. This paper introduces a novel method, based on the categorization and modeling of different constraints, for attitude control of satellite formations under multiple constraints. The method employs the Phased Priority Reinforcement Learning (PPRL) approach, which utilizes Deep Deterministic Policy Gradient (DDPG) technology. Considering the complexity of constraints and the challenge posed by the high control dimensionality due to multi-satellite coordination, the method addresses these challenges through a two-step training strategy. The first step addresses the multi-constraint issue for individual satellites and increases the priority of single-satellite training experience data in the experience replay buffer of the second step to enhance data utilization efficiency. To address the issue of reward sparsity in complex high-dimensional constraint models, a detailed reward mechanism is proposed, incorporating both local and global constraints into the reward function, thereby achieving both efficient and effective attitude control. This approach not only meets dynamic, state, and performance constraints but also demonstrates adaptability and robustness through numerical simulations. Compared to traditional methods, this approach achieves significant improvements in control performance and constraint satisfaction, offering a novel solution pathway for high-dimensional control problems in multi-constraint satellite formations.
Read full abstract