This paper investigates the application of Deep Reinforcement Learning (DRL) in the trajectory optimization of spacecraft far-distance rapid rendezvous, and uses the most advanced DRL method Proximal Policy Optimization (PPO) to solve the continuous high-thrust minimum-fuel trajectory optimization problem. The space J2 perturbation was considered, its impact on the spacecraft’s on-orbit operation and trajectory design was analyzed, and the effectiveness and accuracy of the proposed method were verified in two far-distance rapid rendezvous missions. In order to ensure the safety of the subsequent close-range operation phase, a safe area reward framework is proposed, and sparse and dense safe area reward functions are designed. The dense safe area reward function significantly improves the training efficiency of the algorithm on the basis of ensuring terminal performance. In addition, the modeling and analysis of possible uncertainties in the spacecraft’s orbit operation, including observation uncertainty, state uncertainty and control uncertainty, is carried out to verify the performance of the proposed method through simulation. For uncertainties, the closed-loop performance of the policy is also evaluated by performing Monte Carlo simulations. The results show that the PPO algorithm can effectively deal with the rendezvous problem in uncertainty environments. These preliminary results demonstrate the great potential of the DRL method in achieving autonomous guidance of spacecraft.
Read full abstract