Abstract

To cope with sequential decision problems in non- Markov environments, learning classifier systems using the internal register have been proposed. Since, by utilizing the action part of classifiers, these systems control the internal register in the same way as choosing actions to the environment, they do not always work well. In this paper, we develop an effective learning classifier system with two different rule sets for internal and external actions. The first one is used for determining internal actions, that is, rules for controlling the internal register. It provides stable performance by separating control of the internal register from the action part of classifiers, and it is represented by “If [external state] & [internal state] then [internal action],” and we call a set of the first rules the internal action table. The second one is for selecting external actions as in the classical classifier system, but its structure is slightly different with the classical one; it is represented by “If [external state] & [internal state] & [internal action] then [external action].” In the proposed system, aliased states in the environment are identified by observing payoffs of a classifier and referring to the internal action table. To demonstrate the efficiency and effectiveness of the proposed system, we apply it to woods environments which are used in the related works, and compare the performance of it to those of the existing classifier systems.

Highlights

  • We develop a learning classifier system for non-Markov environments or partially observable Markov decision process (POMDP) where a mechanism for controlling the internal register is separated from classifiers and aliased states are identified by detecting fluctuation of the payoffs received by classifiers

  • In XCSAT, after detecting the fluctuation of payoffs which means the existence of aliased states, the environmental information and the corresponding update of the internal register are recorded in the internal action table as a rule for updating the internal register

  • We develop a learning classifier system called XCSAT (XCS with an internal Action Table) for non-Markov environments or POMDPs where controlling the internal registers is separated from classifiers and aliased positions or states are identified by detecting the fluctuation of the payoffs received by classifiers

Read more

Summary

INTRODUCTION

Classifier systems with if- rules which develop through interaction with environments were initially considered as a computational model for cognition [12], [14], they are widely applied to many areas, including autonomous robotics [8], [29], classification and data mining [33], [25], [15], traffic signal control [2], [4], and FPGA design [6]. It is thought that the number of rules is smaller and memories are efficiently used in classifier systems, and genetic algorithms can be applied to a set of rules represented in if- format without difficulty for evolving the rule set suitably From these features of classifier systems, it is adequate to apply them to problems in nonMarkov environments or POMDPs. In this paper, we develop a learning classifier system for non-Markov environments or POMDPs where a mechanism for controlling the internal register is separated from classifiers and aliased states are identified by detecting fluctuation of the payoffs received by classifiers. The experimental result of XCSAT is shown, compared with XCSM and XCSMH in section 5, and section 6 concludes with some comments

NON-MARKOV ENVIRONMENTS
LEARNING CLASSIFIER SYSTEMS WITH INTERNAL MEMORY
CLASSIFIER SYSTEM WITH AN INTERNAL ACTION
Rule representation and the internal action table
Update and usage of the internal action table
Algorithm of XCSAT
COMPUTATIONAL EXPERIMENT
2: Woods environment
Performance verification
Performance and adaptability for larger problems
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call