Abstract
We propose a new method of recommending preferable solutions of a user in interactive reinforcement learning. Interactive reinforcement learning is different from normal reinforcement learning in that a human gives the reward function to the learner interactively. It is that the reward function may not be fixed for the learner if an end-user changes his mind or his preference. However, most of previous reinforcement learning methods assume that the reward function is fixed and the optimal solution is unique, so they will be useless in interactive reinforcement learning with such an end-user. To solve this, it is necessary for the learner to estimate the userpsilas preference and to consider its changes. This paper proposes a new method how to match an end-userpsilas preference solution with the learnerpsilas recommended solution. Experiments are performed with twenty subjects to evaluate the effectiveness of our method. As the experimental results, a large number of subjects prefer each every-visit-optimal solution than the optimal solution. On the other hand, a small number of subjects prefer each every-visit-non-optimal solution. We will discuss the reason why the end-userspsila preferences are divided into two groups.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.