Characterization of Motion Forms of Mobile Robots Generated in Q-Learning Process

Masayuki Hara,Testuro Yabut,Jian Huang

doi:10.5772/13666

Abstract

Acquisition of unique robotic motions by machine learning is a very attractive research theme in the field of robotics. So far, various learning algorithms—e.g., adaptive learning, neural network (NN) system, genetic algorithm (GA), etc.—have been proposed and applied to the robot to achieve a target. It depends on the persons, but the learning method can be classified roughly into supervised and unsupervised learning (Mitchell, 1997). In supervised learning, the ideal output for target task is available as a teacher signal, and the learning basically proceeds to produce a function that gives an optimal output to the input; the abovementioned learning methods belong to supervised learning. Thus, the learning results should be always within the scope of our expectation. While, the teacher signal is not specifically given in unsupervised learning. Since the designers do not need to know the optimal (or desired) solution, there is a possibility that unexpected solution can be found in the learning process. This article especially discusses the application of unsupervised learning to produce robotic motions. One of the most typical unsupervised learning is reinforcement learning that is a evolutionary computation (Kaelbling et al., 1996; Sutton & Barto, 1998). The concept of this learning method originally comes from the behavioral psychology (Skinner, 1968). As seen in animal evolution, it is expecting that applying this learning method to the robot would have a tremendous potential to find unique robotic motions beyond our expectation. In fact, many reports related to the application of reinforcement learning can be found in the field of robotics (Mahadevan & Conell, 1992; Doya, 1996; Asada et al, 1996; Mataric, 1997; Kalmar et al., 1998; Kimura & Kobayashi, 1999; Kimura et al., 2001, Peters et al., 2003; Nishimura et al., 2005). For example, Doya has succeeded in the acquistion of robotic walking (Doya, 1996). Kimura et al. have demonstrated that reinforcement learning enables the effective advancement motions of mobile robots with several degrees of freedom (Kimura & Kobayashi, 1999; Kimura et al., 2001). As a unique challenge, Nishimura et al. achieved a swing-up control of a real Acrobot—a two-link robot with a single actuator between the links—due to the switching rules of multiple controllers obtained by reinforcement learning (Nishimura et al., 2005). Among these studies, Q-learning, which is a method of

Full Text