Abstract

Considerable attention has been paid to the issue of value function approximation in the reinforcement learning literature [3]. One of the fundamental assumptions underlying algorithms for solving reinforcement learning problems is that states and state-action pairs have well-defined values that can be approximated and used to help determine an optimal policy. The quality of those approximations is a critical factor in determining the success of many algorithms in solving reinforcement learning problems.In most classifier systems, the information about the value function is stored and computed by individual rules. Each rule maintains an independent estimate of the value of taking its designated action in the states that match its condition. From this standpoint, each rule is treated as a separate function approximator. The quality of the approximations that can be achieved by simple estimates like this is not very good. Even when those estimates are pooled together to compute a more reliable collective estimate, it is still questionable how good the overall approximation will be. It is also not clear what the best way is to improve the quality of those approximations.One approach to improving approximation quality is to increase the computational abilities of individual rules so that they become more capable function approximators [4]. Another idea is to look back to the original concepts underlying the classifier system framework and seek to take advantage of the properties of distributed representations in classifier systems [2]. This paper follows in the spirit of the latter approach, looking for ways to tap the distributed representational power present in a collection of rules to improve the quality of value function approximations.Previous work [1] introduced a new approach to value function approximation in classifier systems called hyperplane coding. Hyperplane coding is a closely related variation of tile coding [3] in which classifier rule conditions fill the role of tiles, and there are few restrictions on the way those are organised. The basic idea is to treat rules as features that collectively specify a linear gradient-descent function approximator. The hypothesis behind this idea is that classifier rules can be more effective as function approximators if they work together to implement a distributed, coarse-coded representation of the value function.Experiments with hyperplane coding have shown that by carefully using the resources available in a random population of classifiers, continuous value functions can be approximated with a high degree of accuracy. This approach computes much better approximations than more conventional classifier system methods in which individual rules compute approximations independently. The results to date also demonstrate that hyperplane coding can achieve levels of performance comparable to those achieved by more wellknown approaches to function approximation such as tile coding. High quality value function approximations that provide both data recovery and generalisation are a critically important component of most approaches to solving reinforcement learning problems. Because hyperplane coding substantially improves the quality of the approximations that can be computed by a classifier system using relatively small populations of classifiers, it may provide the foundation for significant improvements in classifier system performance.One open question remaining about hyperplane coding is how the quality of the approximation is affected by the set of classifiers in the population. A random population of classifiers is sufficient to obtain good results. Would a more carefully chosen population do even better? The obvious next step in this research is to use the approximation resources available in a random population as a starting point for a more refined approach to approximation that reallocates resources adaptively to gain greater precision in those regions of the input space where it is needed. This paper shows how to compute such an adaptive function approximation. The goal is learn a population of classifiers that reflects the structure of the input space (Dean & Wellman, 1991). This means more rules (ie. more tiles) should be used to approximate regions which are sampled often and in which the function values vary a great deal. Fewer rules should be used in regions which are rarely sampled and in which the function is nearly constant. We discuss how to adaptively manage the space in the population, as well as how to structure the search for tiles that reduce the approximation error.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call