Linear Quadratic Tracking With Reinforcement Learning Based Reference Trajectory Optimization for the Lunar Hopper in Simulated Environment

Toshiki Tanaka,Marzia Cescon,Heidar Malki

doi:10.1109/access.2021.3134592

Abstract

In this work, we provide a novel optimal guidance and control strategy for lunar hopper obstacle-avoidance, descent, and landing problem and demonstrate its behavior using numerical simulations. More specifically, the major contributions of this paper are three-fold: 1) proposed a feedback-based reference trajectory design for lunar hopper guidance, 2) developed the mathematical models and equations of linear quadratic tracking (LQT) controller for lunar hopper control, and 3) developed a method using reinforcement learning to optimize the designed reference trajectory in conjunction with the designed LQT controller, the so-called linear quadratic tracking with reinforcement learning based reference trajectory optimization (LQT-RTO). We demonstrated the LQT-RTO under a 2-dimensional (2D) lunar hopper simulation environment with 1) the LQT with heuristic reference trajectory design (LQT-HTD) and 2) reinforcement learning (reinforcement learning based controller, or RLC). We confirmed by numerical simulation that the LTQ-RTO outperformed the LQT-HTD in terms of fuel consumption, and outperformed the RLC in terms of landing success rate. Lastly, we provided theoretical interpretation to the simulation results.

Highlights

I N 2009, surficial hydroxyl (OH) and potentially water (H2O) were detected on the sunlit lunar surface by nearinfrared (NIR) spectrometers on the Chandrayaan-1, EPOXI and Cassini spacecraft [1]
It is important to note that the values of k1, k2, and k3 were constant throughout the simulation for the linear quadratic tracking (LQT)-HTD controller
In this study, we proposed 1) a feedback-based reference trajectory design for lunar hopper guidance, 2) a linear quadratic tracking controller design for lunar hopper control, and 3) a method to use reinforcement learning in the reference trajectory design optimization, all of which are eventually summarized into a new guidance and control method, the so-called linear quadratic tracking with reinforcement learning based reference trajectory optimization (LQT-RTO)

Summary

INTRODUCTION

I N 2009, surficial hydroxyl (OH) and potentially water (H2O) were detected on the sunlit lunar surface by nearinfrared (NIR) spectrometers on the Chandrayaan-1, EPOXI and Cassini spacecraft [1]. Lunar hoppers need more continuous and dynamic trajectory (re)planning to avoid different-sized obstacles as they fly over at a lower altitude and, as a result, alternative guidance methods are required. We propose a guidance law based on a linear quadratic tracking (LQT) controller that minimizes path following errors from a dynamically created reference trajectory. We developed a 2-dimensional (2D) lunar surface simulation environment and applied reinforcement learning to find optimal feedback terms for the designed reference trajectory. The reward function of the lunar hopper simulation is based on the Lunar-LanderContinuous-v2 that returns several types of rewards and penalties each time step depending on the states, i.e., y, and control input, i.e., u, as shown in (1): episode finishes if the lander crashes or comes to rest, receiving additional -100 or.

DYNAMICS MODEL

REFERENCE TRAJECTORY DESIGN

COMPARATIVE ANALYSIS

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Linear Quadratic Tracking With Reinforcement Learning Based Reference Trajectory Optimization for the Lunar Hopper in Simulated Environment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Transient Air-to-Fuel Ratio Control of an Spark Ignited Engine Using Linear Quadratic Tracking
Stephen Pace ... Guoming G Zhu
Journal of Dynamic Systems, Measurement, and Control | VOL. 136
Stephen Pace, et. al.Stephen Pace ... Guoming G Zhu
09 Dec 2013
Journal of Dynamic Systems, Measurement, and Control | VOL. 136

Feedforward Based Linear Quadratic Optimal Tracking Control for a Class of Heavy Disturbance Systems
Lu Bohan
-
Lu BohanLu Bohan
01 Nov 2018
01 Nov 2018

Model-free linear quadratic tracking control for unmanned helicopters using reinforcement learning
Dongjin Lee ... Mooncheon Choi
-
Dongjin Lee, et. al.Dongjin Lee ... Mooncheon Choi
01 Dec 2011
01 Dec 2011

Optimal Tracking Control and Stabilization for Stochastic Systems with Multi-Step Input Delay
Chunyan Han ... Yue Liu
-
Chunyan Han, et. al.Chunyan Han ... Yue Liu
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Linear Quadratic Tracking With Reinforcement Learning Based Reference Trajectory Optimization for the Lunar Hopper in Simulated Environment

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access