Abstract

In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent's non-task behavior can affect a human trainer's training and agent learning. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent's actions, as the foundation for our investigation. Then, starting from the premise that the interaction between the agent and the trainer should be bi-directional, we propose two new training interfaces to increase a human trainer's active involvement in the training process and thereby improve the agent's task performance. One provides information on the agent's uncertainty which is a metric calculated as data coverage, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent's performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. These results suggest that the organizational maxim about human behavior, get what you measure--i.e., sharing metrics with people causes them to focus on optimizing those metrics while de-emphasizing other objectives--also applies to the training of agents. Using principle component analysis, we show how trainers in the two conditions train agents differently. In addition, by simulating the influence of the agent's uncertainty---informative behavior on a human's training behavior, we show that trainers could be distracted by the agent sharing its uncertainty levels about its actions, giving poor feedback for the sake of reducing the agent's uncertainty without improving the agent's performance.

Highlights

  • Autonomous agents have the potential to assist people in their daily lives, e.g., help them do the laundry at home, clean the house, perform individualized customer service, etc

  • One popular method based on reinforcement learning (RL)—an area of machine learning concerned with how to map situations to actions so as to maximize a numerical reward signal [31]—incorporates the real-time human reward that reflects the human trainer’s judgement of the quality of the agent’s actions supplied by a trainer who observes the agent’s behavior [14,18,29]

  • In the TAMER framework [18], one solution proposed for learning from human reward, the agent learns from this feedback by directly creating a predictive model of the human trainer’s feedback and myopically choosing the action at each time step that it predicts will receive the highest feedback value

Read more

Summary

Introduction

Autonomous agents (such as robots, software agents, etc.) have the potential to assist people in their daily lives, e.g., help them do the laundry at home, clean the house, perform individualized customer service, etc. To adapt to novel situations, they need to be able to learn new skills from any ordinary person who has much task knowledge but little expertise in autonomous agents or technology in general How well these agents can learn from human users will depend heavily on how efficiently such agents can interact with them. An agent implemented according to the TAMER framework learns from real-time evaluations of its behavior, provided by a human trainer From these evaluations, which we refer to as “reward”, the TAMER agent creates a predictive model of future human reward and chooses actions it predicts will elicit the greatest human reward. The trainer observes the agent’s behavior and can give reward corresponding to its quality

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.