Abstract

Interactive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.