Multiple timescales PIA for cooperative reinforcement learning based on MDP model

Tomohiro Yamaguchi Tomohiro Yamaguchi,Eri Imatani Eri Imatani

doi:10.1109/sice.2007.4421462

Multiple timescales PIA for cooperative reinforcement learning based on MDP model

Tomohiro Yamaguchi Tomohiro Yamaguchi, Eri Imatani Eri Imatani

https://doi.org/10.1109/sice.2007.4421462

Copy DOI

Publication Date: Sep 1, 2007

Citations: 7

Affiliation: National Institute of Technology, Nara College

#Markov Decision Process Model #Markov Decision Process + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

This paper describes a new method of dynamic programming (DP) based multiagent reinforcement learning in Markov decision process (MDP) model. It is difficult for agents to learn cooperative actions among agents properly in multiagent because they may change each policy in same time. To solve this problem, each agent should learn in different time for each policy improvement. Therefore, we propose multiple timescales policy improvement method. We show comparative experiments between multiple timescales policy improvement and exclusive policy improvement. As a result, our methods reduced the search costs for the optimal common-payoff Nash solution.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.