Approximation spaces in off-policy Monte Carlo learning

James F Peters,Christopher Henry

doi:10.1016/j.engappai.2006.11.005

Abstract

This paper introduces an approach to off-policy Monte Carlo (MC) learning guided by behaviour patterns gleaned from approximation spaces and rough set theory introduced by Zdzisław Pawlak in 1981. During reinforcement learning, an agent makes action selections in an effort to maximize a reward signal obtained from the environment. The problem considered in this paper is how to estimate the expected value of cumulative future discounted rewards in evaluating agent actions during reinforcement learning. The solution to this problem results from a form of weighted sampling using a combination of MC methods and approximation spaces to estimate the expected value of returns on actions. This is made possible by considering behaviour patterns of an agent in the context of approximation spaces. The framework provided by an approximation space makes it possible to measure the degree that agent behaviours are a part of (“covered by”) a set of accepted agent behaviours that serve as a behaviour evaluation norm. Furthermore, this article introduces an adaptive action control strategy called run-and-twiddle (RT) (a form of adaptive learning introduced by Oliver Selfridge in 1984), where approximate spaces are constructed on a “need by need” basis. Finally, a monocular vision system has been selected to facilitate the evaluation of the reinforcement learning methods. The goal of the vision system is to track a moving object, and rewards are based on the proximity of the object to the centre of the camera field of view. The contribution of this article is the introduction of a RT form of off-policy MC learning.

Full Text