The stabilization control of nonholonomic systems have been extensively studied because it is essential for nonholonomic robot control problems. The difficulty in this problem is that the theoretical derivation of control policy is not necessarily guaranteed achievable. In this paper, we present a reinforcement learning (RL) method with instance-based policy (IBP) representation, in which control policies for this class are optimized with respect to user-defined cost functions. Direct policy search (DPS) is an approach for RL; the policy is represented by parametric models and the model parameters are directly searched by optimization techniques including genetic algorithms (GAs). In IBP representation an instance consists of a state and an action pair; a policy consists of a set of instances. Several DPSs with IBP have been previously proposed. In these methods, sometimes fail to obtain optimal control policies when state-action variables are continuous. In this paper, we present a real-coded GA for DPSs with IBP. Our method is specifically designed for continuous domains. Optimization of IBP has three difficulties; high-dimensionality, epistasis, and multi-modality. Our solution is designed for overcoming these difficulties. The policy search with IBP representation appears to be high-dimensional optimization; however, instances which can improve the fitness are often limited to active instances (instances used for the evaluation). In fact, the number of active instances is small. Therefore, we treat the search problem as a low dimensional problem by restricting search variables only to active instances. It has been commonly known that functions with epistasis can be efficiently optimized with crossovers which satisfy the inheritance of statistics. For efficient search of IBP, we propose extended crossover-like mutation (extended XLM) which generates a new instance around an instance with satisfying the inheritance of statistics. For overcoming multi-modality, we propose extended CCM for selection. Extended CCM always chooses the child for next generation among children and a parent which generates the children. By doing so, the diversity of the population is expected to be well maintained. Our proposals, FLIP (Functionally sophisticated Learner for IBP), consist of extended XLM and extended CCM. The effectiveness of FLIP is shown by experiments with nonholonomic control problems, a space robot, a car-like robot, and a parallel-type double inverted pendulum.
Read full abstract