Outperformance of Mall-Receptionist Android as Inverse Reinforcement Learning is Transitioned to Reinforcement Learning

Zhichao Chen,Hiroshi Ishiguro,Yutaka Nakamura

doi:10.1109/lra.2023.3267385

Abstract

Robots can tackle human–robot interaction (HRI) tasks through inverse reinforcement learning (IRL). However, offline IRL agents' performance is upper-bounded by experts. Limited demonstration fails to provide an overall picture of the environment, especially in real-world applications. To further enhance IRL's performance, we implement a cross-modal inverse reinforcement learning to reinforcement learning (IRL-to-RL) transition framework for a real-world HRI interaction task, in which a mall receptionist android promotes sanitizer usage. During the 10-day experiment, the android develops a more proactive and effective strategy than the human expert. Furthermore, we explore four decay modes of prior knowledge supervision and suggest a preferable pattern for practical use. Our results demonstrate the feasibility of the framework to assist robots in switching to diverse modalities, learning incrementally with a sparse reward function, and eventually outperforming the human expert. We anticipate our framework to inspire more IRL-to-better learning paradigms, facilitating robots to outcompete teachers in more real-world HRI applications.

Full Text