Abstract

While earlier research in human-robot interaction pre-dominantly uses rule-based architectures for natural language interaction, these approaches are not flexible enough for long-term interactions in the real world due to the large variation in user utterances. In contrast, data-driven approaches map the user input to the agent output directly, hence, provide more flexibility with these variations without requiring any set of rules. However, data-driven approaches are generally applied to single dialogue exchanges with a user and do not build up a memory over long-term conversation with different users, whereas long-term interactions require remembering users and their preferences incrementally and continuously and recalling previous interactions with users to adapt and personalise the interactions, known as the lifelong learning problem. In addition, it is desirable to learn user preferences from a few samples of interactions (i.e., few-shot learning). These are known to be challenging problems in machine learning, while they are trivial for rule-based approaches, creating a trade-off between flexibility and robustness. Correspondingly, in this work, we present the text-based Barista Datasets generated to evaluate the potential of data-driven approaches in generic and personalised long-term human-robot interactions with simulated real-world problems, such as recognition errors, incorrect recalls and changes to the user preferences. Based on these datasets, we explore the performance and the underlying inaccuracies of the state-of-the-art data-driven dialogue models that are strong baselines in other domains of personalisation in single interactions, namely Supervised Embeddings, Sequence-to-Sequence, End-to-End Memory Network, Key-Value Memory Network, and Generative Profile Memory Network. The experiments show that while data-driven approaches are suitable for generic task-oriented dialogue and real-time interactions, no model performs sufficiently well to be deployed in personalised long-term interactions in the real world, because of their inability to learn and use new identities, and their poor performance in recalling user-related data.

Highlights

  • Learning and recalling aspects about a user to personalise interactions is needed for coherent and lifelike humanrobot interactions (HRI) (Lim et al, 2011)

  • It is important to note that a rule-based dialogue manager using template matching on the Barista Datasets achieves 100% accuracy on the Barista Datasets (Irfan et al, 2020a), because the datasets were created from a set of rules with deterministic bot utterances

  • The main conclusions from this work are: 1) Sequence-to-Sequence and End-to-End Memory Network are suitable for generic taskoriented dialogue, achieving up to near-perfect accuracy, 2) none of the models could reach 90% accuracy for personalised long-term dialogue, even when trained on a high number of (10,000) dialogue samples and user preferences information was provided, 3) underlying reason behind the inaccuracies of the models in the personalised task-oriented dialogue were identified to be the lack of capability to use new customer names or order items, the poor performance in recalling the user preferences, and user recognition errors, and 4) all models are suitable for real-time interactions in terms of response times. These results indicate that data-driven architectures are not yet ready to be deployed for personalised long-term human-robot interactions in the real world

Read more

Summary

Introduction

Learning and recalling aspects about a user to personalise interactions is needed for coherent and lifelike humanrobot interactions (HRI) (Lim et al, 2011). Conversations with a robot are challenging, because users may assume multi-modal capabilities based on the various sensors of the robot (e.g., camera, microphones, speakers, tablet) (Goodrich and Schultz, 2007; Rickert et al, 2007), as well as expect the robot to recognise them and recall their previous interactions. Automatic speech recognition errors may arise from various accents, quietly speaking users and pronunciation errors of non-native speakers, which could decrease the robustness of rule-based approaches (Irfan et al, 2020a)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call