Abstract

Planetary soft landing has been studied extensively due to its promising application prospects. In this paper, a soft landing control algorithm based on deep reinforcement learning (DRL) with good convergence property is proposed. First, the soft landing problem of the powered descent phase is formulated and the theoretical basis of Reinforcement Learning (RL) used in this paper is introduced. Second, to make it easier to converge, a reward function is designed to include process rewards like velocity tracking reward, solving the problem of sparse reward. Then, by including the fuel consumption penalty and constraints violation penalty, the lander can learn to achieve velocity tracking goal while saving fuel and keeping attitude angle within safe ranges. Then, simulations of training are carried out under the frameworks of Deep deterministic policy gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor Critic (SAC), respectively, which are of the classical RL frameworks, and all converged. Finally, the trained policy is deployed into velocity tracking and soft landing experiments, results of which demonstrate the validity of the algorithm proposed.

Highlights

  • Planetary soft landing has been studied extensively due to its promising application prospects

  • Based on the dynamic model established above, we will design an algorithm based on Reinforcement Learning (RL) according to the characteristics of soft landing problems, including the selection of observation values and the design of reward function and other settings concerning how the agent interacts the environment

  • The process reward is introduced in the landing process, that is, a reference velocity is given according to the real-time relative position between the lander and target landing area

Read more

Summary

Soft Landing Problem Formulation

The planetary surface fixed frame of reference is defined as Figure 1. As the powered descent begins at an altitude that is quite low compared to the planet’s radius, and the distance between the lander and target landing sites varies slightly during this phase, it is appropriate to assume that the planet’s gravity is a constant g. When it comes to the power descent phase, the lander has already released the parachute and the speed is on the order of 100 meters per second [22]. I where Isp is the specific impulse of the engine, and the inertia matrix will gradually decrease as the mass decreases. The shape of the lander is a cuboid of sides of length a × b × c with uniform mass distribution

RL Basis
Soft Landing with DRL
Reward Setting
Observation Space
Action Space
Network Architecture
Simulation Settings
Simulation Results
Conclusions
Degree of Freedom
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call