Abstract

A robot agent designed to engage in real-world human–robot joint action must be able to understand the social states of the human users it interacts with in order to behave appropriately. In particular, in a dynamic public space, a crucial task for the robot is to determine the needs and intentions of all of the people in the scene, so that it only interacts with people who intend to interact with it. We address the task of estimating the engagement state of customers for a robot bartender based on the data from audiovisual sensors. We begin with an offline experiment using hidden Markov models, confirming that the sensor data contains the information necessary to estimate user state. We then present two strategies for online state estimation: a rule-based classifier based on observed human behaviour in real bars, and a set of supervised classifiers trained on a labelled corpus. These strategies are compared in offline cross-validation, in an online user study, and through validation against a separate test corpus. These studies show that while the trained classifiers are best in a cross-validation setting, the rule-based classifier performs best with novel data; however, all classifiers also change their estimate too frequently for practical use. To address this issue, we present a final classifier based on Conditional Random Fields: this model has comparable performance on the test data, with increased stability. In summary, though, the rule-based classifier shows competitive performance with the trained classifiers, suggesting that for this task, such a simple model could actually be a preferred option, providing useful online performance while avoiding the implementation and data-scarcity issues involved in using machine learning for this task.

Highlights

  • Robots will become more and more integrated into daily life over the decades, with the expectation that the market for service robots will increase greatly over the 20 years [31]

  • We present a final classifier based on Conditional Random Fields: this model has comparable performance on the test data, with increased stability

  • The robot acknowledged a customer on average about 6– 7 s after they first became visible, and a customer received a drink about a minute after their initial appearance—note that this last number includes the full time for the spoken interaction, as well as the 20 s normally taken by the robot arm to physically grasp and hand over the drink

Read more

Summary

Introduction

Robots will become more and more integrated into daily life over the decades, with the expectation that the market for service robots will increase greatly over the 20 years [31]. Especially in public spaces, differ in several ways from the companion-style interactions that have been traditionally considered in social robotics (e.g., [9,14,34]). Interactions in public spaces are often shortterm, dynamic, multimodal, and multi-party. In a public setting, it is not enough for a robot to achieve its task-based goals; instead, it must be able to satisfy the social goals and obligations that arise through interactions with people in real-world settings. We argue that task-based, social interaction in a public space can be seen as an instance of multimodal joint action [32,58]. We consider the socially aware robot bartender shown, which has been developed as part

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.