Abstract

We study the case of value function initialization in a reinforcement learning agent that deals with a set of tasks varying in terms of reward and sampled in a lifelong manner. The existing works tackling this setting of transfer reinforcement learning often consider a uniform sampling of tasks in their experiments, while optimistically transferring the knowledge by using the maximum seen outcome. However, due to the uncertainty of the real world, infrequent events cause the distribution to be non-uniform. As a consequence, the optimistic initialization seems impractical because it gives equally high importance to both frequent and infrequent tasks causing sample complexity to increase. We argue that to overcome such limitation, the agent must be able to assess how optimism is influenced by its uncertainty and confidence; two intercorrelated notions that play a crucial role in decision-making. Therefore we propose a novel approach UCOI (Uncertainty and Confidence aware Optimistic Initialization) that applies optimism only in adequate situations and we prove that our approach shows advantageous results over the existing works, especially for tasks coming from a non-uniform distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call