Abstract

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

Highlights

  • We introduce the average reward criterion to yet another family of

  • The following section describes the differences observed between using the ACS2 with standard discounted reward distribution and two proposed modifications

  • The following section describes the differences observed between using the ACS2 with standardIn all cases, the experiments were performed in an explore–exploit manner, where the mode was discounted reward distribution and two proposed modifications

Read more

Summary

Introduction

Anticipatory Classifier System withAverage Reward Criterion in Discretized Multi-Step Environments.Appl.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call