Abstract

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers’ facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem—Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN–RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.

Highlights

  • Intelligent autonomous agents have the potential to become our high-tech companions in the family of the future

  • An agent learns from human reward, i.e., evaluations of the quality of the agent’s behavior provided by a human user, in a reinforcement learning framework

  • This section briefly introduces interactive reinforcement learning, technical details on the TAMER framework and the Infinite Mario testing domain used in our experiment

Read more

Summary

Introduction

Intelligent autonomous agents have the potential to become our high-tech companions in the family of the future The ability of these intelligent agents to efficiently learn from non-technical users to perform a task in a natural way will be key to their success. An agent learns from human reward, i.e., evaluations of the quality of the agent’s behavior provided by a human user, in a reinforcement learning framework. Different from traditional reinforcement learning, interactive reinforcement learning (Interactive RL) was developed to allow an ordinary human user to shape the agent learner by providing evaluative feedback [22, 32, 33, 48, 49]. The agent’s optimal behavior is decided by the evaluation provided by the human teacher

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call