Abstract
We present a new incremental fuzzy reinforcement learning algorithm to find a sub-optimal policy for infinite-horizon Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). The algorithm addresses the high computational complexity of solving large Dec-POMDPs by generating a compact fuzzy rule-base for each agent. In our method, each agent uses its own fuzzy rule-base to make the decisions. The fuzzy rules in these rule-bases are incrementally created and tuned according to experiences of the agents. Reinforcement learning is used to tune the behavior of each agent in such a way that maximum global reward is achieved. In addition, we propose a method to construct the initial rule-base for each agent using the solution of the underlying MDP. This drastically improves the performance of the algorithm in comparison with random initialization of the rule-base. We assess the performance of our proposed method using several benchmark problems in comparison with some state-of-the-art methods. Experimental results show that our algorithm achieves better or similar reward when compared with other methods. However, from the runtime point of view, our method is superior to all previous methods. Using a compact fuzzy rule-base not only decreases the amount of memory used but also significantly speeds up the learning phase.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.