Abstract

This paper is devoted to model-free attitude control of rigid spacecraft in the presence of control torque saturation and external disturbances. Specifically, a model-free deep reinforcement learning (DRL) controller is proposed, which can learn continuously according to the feedback of the environment and realize the high-precision attitude control of spacecraft without repeatedly adjusting the controller parameters. Considering the continuity of state space and action space, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm based on actor-critic architecture is adopted. Compared with the Deep Deterministic Policy Gradient (DDPG) algorithm, TD3 has better performance. TD3 obtains the optimal policy by interacting with the environment without using any prior knowledge, so the learning process is time-consuming. Aiming at this problem, the PID-Guide TD3 algorithm is proposed, which can speed up the training speed and improve the convergence precision of the TD3 algorithm. Aiming at the problem that reinforcement learning (RL) is difficult to deploy in the actual environment, the pretraining/fine-tuning method is proposed for deployment, which can not only save training time and computing resources but also achieve good results quickly. The experimental results show that DRL controller can realize high-precision attitude stabilization and attitude tracking control, with fast response speed and small overshoot. The proposed PID-Guide TD3 algorithm has faster training speed and higher stability than the TD3 algorithm.

Highlights

  • With the rapid development of space technology, the structure and composition of On-Orbit Servicing Spacecraft (OOSS) are becoming more and more complex, and the performance is constantly improving

  • Aiming at the problem that reinforcement learning (RL) is difficult to deploy in the actual environment, this paper proposes that the algorithm should be pretrained on the ground and fine-tune the parameters on orbit, so as to save training time and computing resources and achieve better results quickly

  • In order to verify the performance of the End-to-End deep reinforcement learning (DRL) controller and the effectiveness of the reward function proposed in Section 3.2, the agent is trained in an ideal environment without external disturbances to realize the attitude stabilization and attitude tracking control of spacecraft, respectively

Read more

Summary

Introduction

With the rapid development of space technology, the structure and composition of On-Orbit Servicing Spacecraft (OOSS) are becoming more and more complex, and the performance is constantly improving. The classical attitude control methods include PID control [2], adaptive control [3], sliding mode control [4], Lyapunov control [5], optimal control [6], and robust H∞ control [7] These control algorithms have achieved good results in simulation experiments and practical applications. In the process of self-learning, DRL optimizes the parameters of neural networks iteratively, which eliminates the trouble of design parameters, enables it to adapt to the changing software, hardware, and environment, and can continuously optimize the performance of the controller by changing the setting of reward function. Aiming at the problem that reinforcement learning (RL) is difficult to deploy in the actual environment, this paper proposes that the algorithm should be pretrained on the ground and fine-tune the parameters on orbit, so as to save training time and computing resources and achieve better results quickly.

Problem Statement and Preliminaries
DRL Controller Design for Spacecraft
Simulation and Results
Case 1
Case 2
Case 3
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call