Abstract

We propose a new method of recommending preferable solutions of a user in interactive reinforcement learning. Interactive reinforcement learning is different from normal reinforcement learning in that a human gives the reward function to the learner interactively. It is that the reward function may not be fixed for the learner if an end-user changes his mind or his preference. However, most of previous reinforcement learning methods assume that the reward function is fixed and the optimal solution is unique, so they will be useless in interactive reinforcement learning with such an end-user. To solve this, it is necessary for the learner to estimate the userpsilas preference and to consider its changes. This paper proposes a new method how to match an end-userpsilas preference solution with the learnerpsilas recommended solution. Experiments are performed with twenty subjects to evaluate the effectiveness of our method. As the experimental results, a large number of subjects prefer each every-visit-optimal solution than the optimal solution. On the other hand, a small number of subjects prefer each every-visit-non-optimal solution. We will discuss the reason why the end-userspsila preferences are divided into two groups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call