Until recently, propositions on the subject of intelligent service companions, like robots, were mostly user and environment independent. Our work is part of the FUI-RoboPopuli project, which concentrates on endowing entertainment companion robots with adaptive and social behavior. More precisely, we focus on the capacity of an intelligent system to learn how to personalize and adapt its behavior/actions according to its interaction situation that describes (a) the current user attributes, and (b) the current environment attributes. Our approach is based on models of the type of Markov decision processes (MDPs) that are largely used for adaptive robot applications. In order to have, as quickly as possible, a relevant adaptive behavior whatever the interaction situation, several approaches were proposed to decrease the sample complexity required to learn the MDP model, including its reward function. We argue that systems that are based on detecting important attributes for each decision are more likely to converge faster than others. To this end, we present two algorithms to learn the MDP reward function through analyzing interaction traces (i.e., the interaction history between the robot and its users including their feedback regarding the robot actions). The first algorithm is direct, certain and does not particularly exploit its knowledge to adapt to unknown situations (i.e., unknown users’ and/or environment settings). The second is able to detect the importance of certain situation attributes in the adaptation process. The detection of important attributes is used to speed up the learning process and helps by generalizing the learned reward function to unknown situations. In this paper, we present both learning algorithms, simulated experiments and an experiment with the EMOX (EMOtion eXchange) robot that was upgraded during the FUI-RoboPopuli project. The results of those experiments prove that when dealing with adaptive decision making, the detection of important attributes for each decision speeds up the learning process and help in achieving convergence using fewer samples. We also present a scaling analysis through the simulated experiments.
Read full abstract