Obtaining Humanoid Robot Controller Using Reinforcement Learning

Masayoshi Kanoh,Hidenori Itoh

doi:10.5772/4877

Abstract

Demand for robots is shifting from their use in industrial applications to their use in domestic situations, where they “live” and interact with humans. Such robots require sophisticated body designs and interfaces to do this. Humanoid robots that have multidegrees-of-freedom (MDOF) have been developed, and they are capable of working with humans using a body design similar to humans. However, it is very difficult to intricately control robots with human generated, preprogrammed, learned behavior. Learned behavior should be acquired by the robots themselves in a human-like way, not programmed manually. Humans learn actions by trial and error or by emulating someone else’s actions. We therefore apply reinforcement learning for the control of humanoid robots because this process resembles a human’s trial and error learning process. Many existing methods of reinforcement learning for control tasks involve discrediting state space using BOXES (Michie & Chambers, 1968; Sutton & Barto, 1998) or CMAC (Albus, 1981) to approximate a value function that specifies what is advantageous in the long run. However, these methods are not effective for doing generalization and cause perceptual aliasing. Other methods use basis function networks for treating continuous state space and actions. Networks with sigmoid functions have the problem of catastrophic interference. They are suitable for off-line learning, but are not adequate for on-line learning such as that needed for learning motion (Boyan & Moore, 1995; Schaal & Atkeson, 1996). On the contrary, networks with radial basis functions are suitable for on-line learning. However, learning using these functions requires a large number of units in the hidden layer, because they cannot ensure sufficient generalization. To avoid this problem, methods of incremental allocation of basis functions and adaptive state space formation were proposed (Morimoto & Doya, 1998; Samejima & Omori, 1998; Takahashi et al., 1996; Moore & Atkeson, 1995). In this chapter, we propose a dynamic allocation method of basis functions called Allocation/Elimination Gaussian Softmax Basis Function Network (AE-GSBFN), that is used in reinforcement learning to treat continuous high-dimensional state spaces. AEGSBFN is a kind of actor-critic method that uses basis functions and it has allocation and elimination processes. In this method, if a basis function is required for learning, it is allocated dynamically. On the other hand, if an allocated basis function becomes redundant, the function is eliminated. This method can treat continuous high-dimensional state spaces

Full Text