A Q-learning with Selective Generalization Capability and its Application to Layout Planning of Chemical Plants

Yoichi Hirashima

doi:10.5772/6678

Abstract

Under environments that the criteria to achieve a certain objective is unknown, the reinforcement learning is known to be effective to collect, store and utilize information returned from the environments. Without a supervisor, the method can construct criteria for evaluation of actions to achieve the objective. However, since the information received by a learning agent is obtained through an interaction between the agent and the environment, the agent must move widely around the environment and keep vast data for constructing criteria when complex actions are required to achieve the objective. To conqure these drawbacks, function approximation methods that have generalization capability have had the attention as one of effective methods. The challenge of this chapter is focused on improving learning performances of the rainforcement learning by using a function approximation method, a modefied version of Cerebellar Model Articulation Controller (CMAC) (Albus, 1975a; Albus, 1975b), used in the reinforcement learning. CMAC is a table look-up method that has generalization capabilities and is known as a function learning method without using precise mathematical models for nonlinear functions. Thus, CMAC is used to approximate evaluation functions in reinforcement learning in order to improve learning performance (Sutton & Barto, 1999; Watkins, 1989). In the CMAC, the numerical information is distributively stored at memory locations as weights. Each weight is associated with a basis function which outputs a non-zero value in a specified region of the input. The CMAC input is quantized by a lattice constructed by basis functions. In order to speed up learning and increase the information spread to adjacent basis functions, the CMAC updates a group of weights associated with basis functions that are close to a given point, and thus yields generalization capability. The concept of closeness stems from the assumption that similar inputs will require similar outputs for well-behaved systems. The structure of lattice determines how the CMAC input space is quantized and how the generalization works. However, the conventional CMAC has a fixed lattice and a fixed shape of region covered by the effects of generalization. Although the size of the region can be changed by adjusting the quantization intervals for lattice, the shape of the region is not adjustable. The required size and shape of the regions are not same for different cases, and thus, the CMAC has difficulties to obtain appropriate generalization for each case. O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

Full Text