Abstract

This paper presents a 3 Degrees-of-Freedom (DoF) rocket landing model environment, controlled by an agent trained with the Proximal Policy Optimization (PPO) reinforcement learning algorithm. The objectives of this work are to model the dynamics of a rocket and its environment, convert into a simulated environment adequate to reinforcement learning, and evaluate PPO training results. This work contributes by implementing realistic models, and by contrasting basic implementations of PPO and another advanced reinforcement learning technique. The proposed model is a 3-DoF longitudinal rocket with mass-varying properties, landing gear, and stochastic wind disturbances. The environment is modeled with an observation space composed of kinematic and contact properties only, being a subset of all time-varying properties. The action space is composed of three elements: main thruster effort, nozzle angle, and side thruster effort. The reward computation is based on state, fuel consumption, action transitions, and termination status. Simple control techniques are generally not able to stabilize such complex systems. Reinforcement learning is chosen to tackle the complexity of the problem, and PPO for its theoretical training stability and continuous space treatment, in both observation and action space. Training and policy deployment assessments are presented to verify the algorithm efficacy and controllability of the proposed problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call