Aerobraking is a process used to slow down and insert a spacecraft into a low orbit around a planet. It is composed of many orbital passages into the complex atmosphere of the planet, which is used for braking. The aerobraking atmospheric passages are challenging because of the high variability of the atmospheric environment. For this reason, autonomous aerobraking planning is essential for safety and mission performance. This paper develops a parallel domain randomized deep reinforcement learning architecture for autonomous decision-making in a stochastic environment, such as aerobraking atmospheric passages. In this context, the architecture is used for planning aerobraking maneuvers to avoid the occurrence of thermal violations during the atmospheric aerobraking passages and target a final low-altitude orbit. The parallel domain randomized deep reinforcement learning architecture is designed to account for large variability of the physical model, as well as uncertain conditions. Also, the parallel approach speeds up the training process for simulation-based applications, and the domain randomization improves resultant policy generalization. To use this architecture, a Markov-Decision process framework is developed for a general aerobraking-type mission. A three-dimensional running reward function, expressed in spacecraft state and action, is designed. This framework is applied to the 2001 Mars Odyssey aerobraking campaign, which is also used to verify the performance of the parallel domain randomized deep reinforcement learning architecture. With respect to the 2001 Mars Odyssey mission flight data and a Numerical Predictor Corrector (NPC)-based state-of-the-art heuristic for autonomous aerobraking, the proposed architecture outperforms the state-of-the-art heuristic algorithm with an average increase of 87.2 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> in the cumulative reward and a decrease of 97.5 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> in the number of thermal violations. Specifically, the proposed architecture is able to predict and avoid thermal violations while requiring fewer computational resources. Furthermore, it yields a decrease of 98.7 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> in the number of thermal violations with respect to the Mars Odyssey mission flight data and requires 13.9 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\%$</tex-math></inline-formula> fewer orbits, with a comparable aerobraking duration and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\Delta$</tex-math></inline-formula> V budget. Results also show that the proposed architecture can also learn a generalized policy in the presence of strong uncertainties, such as aggressive atmospheric density perturbations, different atmospheric density models, and a different simulator maximum step size and error accuracy. To this end, a generalization analysis is performed using out-of-distribution generalization environments. Results of the generalization analysis show that the architecture can perform safe aerobraking campaigns with only a maximum increase of 2 in the number of thermal violations for cases in which the simulator accurately described the physical model.
Read full abstract