A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

Jianan Yang,Quan Pan,Yu Hen Hu,Xiaolei Hou,Yong Liu

doi:10.1109/access.2020.3001311

Jianan Yang, Quan Pan + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.3001311

Copy DOI

Abstract

The increasing number of space debris is a critical impact on space environment. Active multi-debris removal (ADR) mission planning technique with maximal reward objective is getting more attention. As the goal of Reinforcement Learning (RL) is in accordance with maximal-reward optimization model of ADR, the planning will be more efficient with the advanced RL scheme and RL algorithm. In this paper, first, an RL formulation is presented for the ADR mission planning problem. All the basic components of maximal-reward optimization model are recast in RL scheme. Second, a modified Upper Confidence bound Tree (UCT) search algorithm for the ADR planning task is developed, which both leverages the neural-network-assisted selection and expansion procedures to facilitate exploration and incorporates roll-out simulation in the backup procedure to achieve robust value estimation. This algorithm fits the RL scheme of ADR mission planning and better balances the exploration and exploitation. Experimental comparison using three subsets of Iridium 33 debris cloud data reveals a better performance of this modified UCT over previously reported results and close UCT variants.

Highlights

Low Earth orbits (LEO) space debris poses serious threats [1], [2] on future on-orbit missions [3]
A reinforcement learning framework is proposed for solving the Active (multiple) Debris Removal (ADR) mission planning problem
The components of maximal-reward optimization model are cast into Reinforcement Learning (RL) scheme

Summary

INTRODUCTION

Low Earth orbits (LEO) space debris poses serious threats [1], [2] on future on-orbit missions [3]. In high-level ADR mission planning, the objective is only influenced by the actions at each state It is the core of the Markov Decision Process (MDP) [30]–[32], which means the RL model is applicable to solve this problem. The dynamic model is assumed to be under J2 perturbation [8], [18] and the drift orbit transfer strategy [10]–[12], [15] is utilized to access to the impulse cost Under this circumstance, the high-level planning is possible to be solved as the reference for future rendezvous planning and on-orbit operation, and all the processes of the planning are regarded to be deterministic functions, such as time-dependent cost function, state transition function, and reward function. Where ap(aq), Ip(Iq) are the semi-major axis and inclination of debris #p(#q), is the RAAN change of OTV to complete in the time interval tp to tq

REINFORCEMENT LEARNING FORMULATION

BASIC DEFINITION

MODIFIED UCT ALGORITHM

12: Pick the action from legal actions based on the policy

EXPERIMENT

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 45	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Control of a bioreactor using a new partially supervised reinforcement learning algorithm
B Jaganatha Pandian ... Mathew Mithra Noel
Journal of Process Control | VOL. 69
B Jaganatha Pandian, et. al.B Jaganatha Pandian ... Mathew Mithra Noel
24 Jul 2018
Journal of Process Control | VOL. 69

Multiagent reinforcement learning applied to a chase problem in a continuous world
Hiroki Tamakoshi ... Shin Ishii
Artificial Life and Robotics | VOL. 5
Hiroki Tamakoshi, et. al.Hiroki Tamakoshi ... Shin Ishii
01 Dec 2001
Artificial Life and Robotics | VOL. 5

Intelligent tensor product mode transformation-based three-sliding-surface sliding mode controller design
Huang Sharina ... Liu Ying
-
Huang Sharina, et. al.Huang Sharina ... Liu Ying
01 Jul 2015
01 Jul 2015

Tensor product model transformation based integral sliding mode control with reinforcement learning strategy
Guoliang Zhao ... Degang Wang
-
Guoliang Zhao, et. al.Guoliang Zhao ... Degang Wang
01 Jul 2014
01 Jul 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access