Using Learning Classifier Systems to Learn Stochastic Decision Policies

Gang Chen,Mengjie Zhang,Colin I J Douch

doi:10.1109/tevc.2015.2415464

Abstract

To solve reinforcement learning problems, many learning classifier systems (LCSs) are designed to learn state-action value functions through a compact set of maximally general and accurate rules. Most of these systems focus primarily on learning deterministic policies by using a greedy action selection strategy. However, in practice, it may be more flexible and desirable to learn stochastic policies, which can be considered as direct extensions of their deterministic counterparts. In this paper, we aim to achieve this goal by extending each rule with a new policy parameter. Meanwhile, a new method for adaptive learning of stochastic action selection strategies based on a policy gradient framework has also been introduced. Using this method, we have developed two new learning systems, one based on a regular gradient learning technology and the other based on a new natural gradient learning method. Both learning systems have been evaluated on three different types of reinforcement learning problems. The promising performance of the two systems clearly shows that LCSs provide a suitable platform for efficient and reliable learning of stochastic policies.

Full Text