Abstract

This research studies the application of reinforcement learning in spacecraft angles-only rendezvous guidance in the presence of multiple constraints and uncertainties. To apply a standard reinforcement learning framework, a stochastic optimal control problem is constructed as a time-discrete Markov decision process, and the observability of angles-only navigation is included in the construction of a reward function. A proximal policy optimization algorithm is used to train a deep neural network (DNN) to map observations of spacecraft state to an optimal control strategy. The trained guidance control network provides a robust nominal orbit and the corresponding closed-loop guidance law, which has the potential to improve the performance of closed-loop navigation, guidance and control. Numerical simulation results are given for an example of far-rendezvous approach guidance. The influence of different weights of the observability term in the reward function on the optimal control strategy is studied. The robustness of the obtained closed-loop control law is evaluated and verified through Monte Carlo simulations in disturbed scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call