Abstract

We investigate the performance of a learning classifier system in some simple multi-objective, multi-step maze problems, using both random and biased action-selection policies for exploration. Results show that the choice of action-selection policy can significantly affect the performance of the system in such environments. Further, this effect is directly related to population size, and we relate this finding to recent theoretical studies of learning classifier systems in single-step problems.

Highlights

  • This article is organized : In Section 2 we briefly introduce some related work, and in Section 3 we give an overview of the XCS classifier system

  • XCS Classifier System for Multi-objective Reinforcement Learning Problems using a random action-selection policy, and half using a deterministic policy. Since it is our intention in these investigations to aim always at the use of XCS for multi-objective problems performed by physical agents in the real world, we first investigate the performance of XCS using the traditional 50/50 random exploration, and investigate its performance using roulette-wheel action selection during the exploration trials

  • We have found that roulette-wheel exploration can produce comparable results to that achieved using a random action selection policy at the expense of increasing N

Read more

Summary

Introduction

The needs of the moment may override longer-term objectives: the need for food is less than the need to avoid being eaten, the need for shelter may overcome the drive to mate, and so on Balancing these conflicting drives is the difference between success and failure — between life and death — and the optimization of balancing behaviors is subject to great evolutionary pressure. XCS Classifier System for Multi-objective Reinforcement Learning Problems using a random action-selection policy, and half using a deterministic policy. Since it is our intention in these investigations to aim always at the use of XCS for multi-objective problems performed by physical agents in the real world, we first investigate the performance of XCS using the traditional 50/50 random exploration, and investigate its performance using roulette-wheel action selection during the exploration trials.

Related Work
XCS: A Brief Description
Sequential Multi-objective Maze Problems
Concurrent Multi-objective Maze Problems
Roulette-wheel Exploration — Discussion
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.