Modeling reward functions for incomplete state representations via echo state networks

K Bush,C Anderson

doi:10.1109/ijcnn.2005.1556402

Abstract

This paper investigates an echo state network (ESN) (Jaeger, 2001 and Maass and Markram, 2002) architecture as the approximation of the Q-function for temporally dependent rewards embedded in a linear dynamical system, the mass-spring-damper (MSD). This problem has been solved utilizing feed-forward neural networks (FNN) when all state information necessary to specify the dynamics is provided as input (Kretchmar, 2000). Time-delayed neural networks (TDNN) solve this problem with finite-size windows of incomplete state information. Our research demonstrates that the ESN architecture represents the Q-function of the MSD system given incomplete state information as well as current feed forward neural networks given either perfect state or a temporally-windowed, incomplete state vector. The remainder of this paper is organized as follows. We introduce basic concepts of reinforcement learning and the echo state network architecture. The MSD system simulation is defined in section IV. Experimental results for learning state quality given incomplete state information are presented in section V. Results for learning estimates of all future state qualities for incomplete state information is presented in section VI. Section VII discusses the potential of the ESN for use in reinforcement learning and provides current and future directions of research.

Full Text