Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems

Atsushi Miyamae,Jun Sakuma,Shigenobu Kobayashi,Isao Ono

doi:10.1527/tjsai.24.104

Abstract

The stabilization control of nonholonomic systems have been extensively studied because it is essential for nonholonomic robot control problems. The difficulty in this problem is that the theoretical derivation of control policy is not necessarily guaranteed achievable. In this paper, we present a reinforcement learning (RL) method with instance-based policy (IBP) representation, in which control policies for this class are optimized with respect to user-defined cost functions. Direct policy search (DPS) is an approach for RL; the policy is represented by parametric models and the model parameters are directly searched by optimization techniques including genetic algorithms (GAs). In IBP representation an instance consists of a state and an action pair; a policy consists of a set of instances. Several DPSs with IBP have been previously proposed. In these methods, sometimes fail to obtain optimal control policies when state-action variables are continuous. In this paper, we present a real-coded GA for DPSs with IBP. Our method is specifically designed for continuous domains. Optimization of IBP has three difficulties; high-dimensionality, epistasis, and multi-modality. Our solution is designed for overcoming these difficulties. The policy search with IBP representation appears to be high-dimensional optimization; however, instances which can improve the fitness are often limited to active instances (instances used for the evaluation). In fact, the number of active instances is small. Therefore, we treat the search problem as a low dimensional problem by restricting search variables only to active instances. It has been commonly known that functions with epistasis can be efficiently optimized with crossovers which satisfy the inheritance of statistics. For efficient search of IBP, we propose extended crossover-like mutation (extended XLM) which generates a new instance around an instance with satisfying the inheritance of statistics. For overcoming multi-modality, we propose extended CCM for selection. Extended CCM always chooses the child for next generation among children and a parent which generates the children. By doing so, the diversity of the population is expected to be well maintained. Our proposals, FLIP (Functionally sophisticated Learner for IBP), consist of extended XLM and extended CCM. The effectiveness of FLIP is shown by experiments with nonholonomic control problems, a space robot, a car-like robot, and a parallel-type double inverted pendulum.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Japanese Society for Artificial Intelligence	Publication Date: Jan 1, 2009
Citations: 12	License type: free

R Discovery Prime

R Discovery Prime

Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems

Abstract

Talk to us

Similar Papers

More From: Transactions of the Japanese Society for Artificial Intelligence

Lead the way for us

Similar Papers

Optimization of Instance-based Policy Based on Real-coded Genetic Algorithms
Atsushi Miyamae ... Jun Sakuma
-
Atsushi Miyamae, et. al. Atsushi Miyamae ... Jun Sakuma
01 Jun 2008
01 Jun 2008

Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control
Marta Zaniolo ... Matteo Giuliani
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Marta Zaniolo, et. al.Marta Zaniolo ... Matteo Giuliani
01 Oct 2022
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent
Adrien Bolland ... Damien Ernst
Journal of Artificial Intelligence Research | VOL. 73
Adrien Bolland, et. al.Adrien Bolland ... Damien Ernst
05 Jan 2022
Journal of Artificial Intelligence Research | VOL. 73

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search
Tobias Jung ... Louis Wehenkel
International Journal of Adaptive Control and Signal Processing | VOL. 28
Tobias Jung, et. al.Tobias Jung ... Louis Wehenkel
11 Feb 2013
International Journal of Adaptive Control and Signal Processing | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems

Abstract

Talk to us

Similar Papers

More From: Transactions of the Japanese Society for Artificial Intelligence