Abstract

In this work, we provide a novel optimal guidance and control strategy for lunar hopper obstacle-avoidance, descent, and landing problem and demonstrate its behavior using numerical simulations. More specifically, the major contributions of this paper are three-fold: 1) proposed a feedback-based reference trajectory design for lunar hopper guidance, 2) developed the mathematical models and equations of linear quadratic tracking (LQT) controller for lunar hopper control, and 3) developed a method using reinforcement learning to optimize the designed reference trajectory in conjunction with the designed LQT controller, the so-called linear quadratic tracking with reinforcement learning based reference trajectory optimization (LQT-RTO). We demonstrated the LQT-RTO under a 2-dimensional (2D) lunar hopper simulation environment with 1) the LQT with heuristic reference trajectory design (LQT-HTD) and 2) reinforcement learning (reinforcement learning based controller, or RLC). We confirmed by numerical simulation that the LTQ-RTO outperformed the LQT-HTD in terms of fuel consumption, and outperformed the RLC in terms of landing success rate. Lastly, we provided theoretical interpretation to the simulation results.

Highlights

  • I N 2009, surficial hydroxyl (OH) and potentially water (H2O) were detected on the sunlit lunar surface by nearinfrared (NIR) spectrometers on the Chandrayaan-1, EPOXI and Cassini spacecraft [1]

  • It is important to note that the values of k1, k2, and k3 were constant throughout the simulation for the linear quadratic tracking (LQT)-HTD controller

  • In this study, we proposed 1) a feedback-based reference trajectory design for lunar hopper guidance, 2) a linear quadratic tracking controller design for lunar hopper control, and 3) a method to use reinforcement learning in the reference trajectory design optimization, all of which are eventually summarized into a new guidance and control method, the so-called linear quadratic tracking with reinforcement learning based reference trajectory optimization (LQT-RTO)

Read more

Summary

INTRODUCTION

I N 2009, surficial hydroxyl (OH) and potentially water (H2O) were detected on the sunlit lunar surface by nearinfrared (NIR) spectrometers on the Chandrayaan-1, EPOXI and Cassini spacecraft [1]. Lunar hoppers need more continuous and dynamic trajectory (re)planning to avoid different-sized obstacles as they fly over at a lower altitude and, as a result, alternative guidance methods are required. We propose a guidance law based on a linear quadratic tracking (LQT) controller that minimizes path following errors from a dynamically created reference trajectory. We developed a 2-dimensional (2D) lunar surface simulation environment and applied reinforcement learning to find optimal feedback terms for the designed reference trajectory. The reward function of the lunar hopper simulation is based on the Lunar-LanderContinuous-v2 that returns several types of rewards and penalties each time step depending on the states, i.e., y, and control input, i.e., u, as shown in (1): episode finishes if the lander crashes or comes to rest, receiving additional -100 or.

DYNAMICS MODEL
REFERENCE TRAJECTORY DESIGN
COMPARATIVE ANALYSIS
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call