Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analyzed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modeled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.
Read full abstract