Abstract

This chapter discusses the emergence of cooperative behavior in multiagent system and presents an objective-based reinforcement learning system in order to acquire cooperative behavior. Reinforcement learning is a method that agents will acquire the optimum behavior by trial and error by being given rewards in an environment as a compensation for its behavior (Kaelbling et al., 1996; Sutton & Barto, 1998; Weber et al. 2008). Most of studies on reinforcement learning have been conducted for single agent learning in a static environment. The Q-learning which is a typical learning method is proved that it converges to an optimum solution for Markov decision process (MDP) (Watkins & Dayan, 1992). However, in a multiagent environment, as plural agents' behavior may affect the state transition, the environment is generally considered as non Markov decision process (non-MDP), and we must face critical problems whether it is possible to solve (Stone & Veloso, 2000). On the above problems in a multiagent environment, Arai et al. have compared Q-learning with profit sharing (PS) (Grefenstette, 1988) using the pursuit problem in a grid environment (Arai et al., 1997). As a result, Q-learning has instability for learning because it uses Q values of the transited state in an updating equation. However, PS can absorb the uncertainty of the state transition because of cumulative discounted reward. Therefore, they concluded that PS is more suitable than Q-learning in the multiagent environment (Arai et al., 1997; Miyazaki & Kobayashi, 1998). Uchibe et al. have presented the capability of learning in a multiagent environment since relation between actions of a learner and the others is estimated as a local prediction model (Uchibe et al., 2002). However, PS has a problem of inadequate convergence because PS reinforces all the pairs of a state and an action irrespective of the achievement of a purpose (Nakano et al., 2005). This chapter presents an objective-based reinforcement learning system for multiple autonomous mobile robots to solve the above problem and to emerge cooperative behavior (Kobayashi et al, 2007). The proposed system basically employs PS as a learning method but a PS table, which is used for PS learning, is divided into two kinds of PS tables to solve the above problem. One is to learn cooperative behavior using information on other agents’ positions and the other is to learn how to control basic movements. Through computer 14

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.