Mission planning for deep space detectors is a pivotal step in the successful execution of detection missions. Traditional planning approaches, which typically compartmentalize mission planning and data transmission scheduling, exhibit limitations in adapting to uncertainties and are often inadequate in responding to unforeseen opportunity targets. This paper introduces a mission planning method for deep space detectors designed to address these challenges. Firstly, a Markov Decision Process (MDP) model is formulated, integrating mission planning and data transmission scheduling, with a consideration for planning balance between detection missions and opportunity targets. Subsequently, a Planning Balance with Proximal Policy Optimization (PB-PPO) algorithm is proposed. The proposed algorithm, based in the Proximal Policy Optimization (PPO) algorithm, integrates an orthogonal initialization algorithm to afford improved control over parameter updates. Furthermore, a dynamic learning rate strategy is implemented to accelerate convergence speed. Experimental results show that, the PB-PPO achieves rewards that are 4.42%, 6.78%, 18.32%, and 26.81% higher than those obtained by compared algorithms. Additionally, the PB-PPO demonstrates the capability to address the planning balance between detection missions and opportunity targets, ensuring stable reward growth even when planning a significant number of opportunity targets. In summary, the PB-PPO integrates of MDP, deep reinforcement learning, and innovative strategies to make it a robust solution for detector mission planning in the complex and dynamic environment of deep space.
Read full abstract