A Memory-Based Reinforcement Learning Algorithm to Prevent Unlearning in Neural Networks

Seiichi Ozawa,Shigeo Abe

doi:10.1007/978-3-540-39935-3_13

Abstract

In reinforcement learning tasks, neural networks have often been used as function approximator for an action-value function that gives an expected total rewards for an agent’s action. Since a reward is given from the environment only after an agent takes an action, the learning of the agent’s action-value function is inevitably done in an incremental fashion. It is well known that this type of incremental learning can cause “catastrophic interference” that leads to unlearning of previously acquired knowledge in neural networks. To solve this problem, in this chapter, we introduce a memory mechanism into an extended model of Radial Basis Function (RBF) networks, in which hidden units are adaptively allocated for incoming inputs. In this model, several representative input-output pairs are extracted from the trained action-value function, and they are stored into an external memory. When the agent improves its policy through trial and error, not only a usual temporal difference error but also network errors for some of the extracted pairs are reduced by a gradient-based learning algorithm. The reduction of both errors allows the proposed model to learn a desired action-value function stably. In order to evaluate the incremental learning ability, the proposed model is applied to two standard problems: Random Walk Task and Mountain-Car Task. In these tasks, the working space of an agent is extended as the learning proceeds. In the simulations, we verify that the proposed model can acquire proper action-values with small memory resources as compared with some conventional models.

Full Text