Abstract

The avoidance of multiple space debris collisions by autonomous spacecraft has garnered significant interests worldwide. Applying deep reinforcement learning (DRL) to autonomous spacecraft collision avoidance problems is still difficult because of limitations on constraint satisfaction and environment state perception, even if DRL is a suitable model-free and data-driven framework. In this research, a state-of-the-art penalized proximal policy optimization (P3O) method is applied to address the spacecraft's autonomous collision avoidance problem, which is formalized as a constrained Markov decision process (CMDP). In contrast with traditional DRL methods, P3O promises to satisfy multiple constraints in actual spaceship operations while also facilitating efficient learning in multi-dimensional, continuous state and action spaces. The scalability of the P3O algorithm is enhanced by combining the feature extraction capabilities and variable-length input capabilities of the long short-term memory (LSTM) networks, enabling the P3O to adapt to a variable number of space debris without the need for network retraining. The performance of the proposed method is compared with other five methods through simulation cases, which verifies the superior performance of the proposed method in terms of scalability, energy consumption, collision probability and constraint satisfaction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call