Abstract
An autonomous optimal trajectory planning method based on the deep deterministic policy gradient (DDPG) algorithm of reinforcement learning (RL) for hypersonic vehicles (HV) is proposed in this paper. First, the trajectory planning problem is converted into a Markov Decision Process (MDP), and the amplitude of the bank angle is designated as the control input. The reward function of the MDP is set to minimize the trajectory terminal position errors with satisfying hard constraints. The deep neural network (DNN) is used to approximate the policy function and action-value function in the DDPG framework. The Actor network then computes the control input directly according to flight states. Using a limited exploration strategy, the optimal policy network would be considered fully trained when the reward value reached maximum convergence. Simulation results show that the policy network trained using a DDPG algorithm accomplishes 3-dimensional (3D) trajectory planning during the HV glide phase with high terminal precision and stable convergence. Additionally, the single step calculation time of the policy network occurs in near real time, which suggests great potential as an autonomous online trajectory planner. Monte Carlo experiments prove the strong robustness of the implementation of an autonomous trajectory planner under aerodynamic disturbances.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.