Abstract
We investigate the performance of a learning classifier system in some simple multi-objective, multi-step maze problems, using both random and biased action-selection policies for exploration. Results show that the choice of action-selection policy can significantly affect the performance of the system in such environments. Further, this effect is directly related to population size, and we relate this finding to recent theoretical studies of learning classifier systems in single-step problems.
Highlights
This article is organized : In Section 2 we briefly introduce some related work, and in Section 3 we give an overview of the XCS classifier system
XCS Classifier System for Multi-objective Reinforcement Learning Problems using a random action-selection policy, and half using a deterministic policy. Since it is our intention in these investigations to aim always at the use of XCS for multi-objective problems performed by physical agents in the real world, we first investigate the performance of XCS using the traditional 50/50 random exploration, and investigate its performance using roulette-wheel action selection during the exploration trials
We have found that roulette-wheel exploration can produce comparable results to that achieved using a random action selection policy at the expense of increasing N
Summary
The needs of the moment may override longer-term objectives: the need for food is less than the need to avoid being eaten, the need for shelter may overcome the drive to mate, and so on Balancing these conflicting drives is the difference between success and failure — between life and death — and the optimization of balancing behaviors is subject to great evolutionary pressure. XCS Classifier System for Multi-objective Reinforcement Learning Problems using a random action-selection policy, and half using a deterministic policy. Since it is our intention in these investigations to aim always at the use of XCS for multi-objective problems performed by physical agents in the real world, we first investigate the performance of XCS using the traditional 50/50 random exploration, and investigate its performance using roulette-wheel action selection during the exploration trials.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.