Thrust Vectored Rocket Landing Integrated Guidance and Control with Proximal Policy Optimization

Gabriel De Almeida Souza,Octavio Mathias Silva,Marcos R O A Maximo

doi:10.1109/lars/sbr/wre56824.2022.9995921

Gabriel De Almeida Souza, Octavio Mathias Silva + Show 1 more

https://doi.org/10.1109/lars/sbr/wre56824.2022.9995921

Copy DOI

Abstract

This paper presents a 3 Degrees-of-Freedom (DoF) rocket landing model environment, controlled by an agent trained with the Proximal Policy Optimization (PPO) reinforcement learning algorithm. The objectives of this work are to model the dynamics of a rocket and its environment, convert into a simulated environment adequate to reinforcement learning, and evaluate PPO training results. This work contributes by implementing realistic models, and by contrasting basic implementations of PPO and another advanced reinforcement learning technique. The proposed model is a 3-DoF longitudinal rocket with mass-varying properties, landing gear, and stochastic wind disturbances. The environment is modeled with an observation space composed of kinematic and contact properties only, being a subset of all time-varying properties. The action space is composed of three elements: main thruster effort, nozzle angle, and side thruster effort. The reward computation is based on state, fuel consumption, action transitions, and termination status. Simple control techniques are generally not able to stabilize such complex systems. Reinforcement learning is chosen to tackle the complexity of the problem, and PPO for its theoretical training stability and continuous space treatment, in both observation and action space. Training and policy deployment assessments are presented to verify the algorithm efficacy and controllability of the proposed problem.

Full Text