Abstract

AbstractThe use of social robots in healthcare systems or nursing homes to assist the elderly and their caregivers will be becoming common, where robots' understanding of engagement of the elderly is important. Traditional engagement estimation (EE) often requires expert involvement in a controlled dyadic interaction environment. In this article, we propose a supervised machine learning method to estimate the engagement state of the elderly in a multiparty human–robot interaction (HRI) scenario from the real‐world video recording as input. The method is built upon the basic concept of engagement in geriatric psychiatry and HRI video representations. It adapts pretrained models to extract behavior, affective, and visual signals to form the multi‐modal features. These features are then fed into a neural network made of a self‐attention mechanism and average pooling for individual learning, a graph attention network for group learning and a fully connected layer to estimate the engagement. We tested the proposed method using 43 wild multiparty elderly robot interaction (ERI) videos. The experimental results show that our method is capable of detecting the key participants and estimating the engagement state of the elderly effectively. Also our study demonstrates the signals from side‐participants in the main interaction group considerably contribute to the EE of the elderly in the multiparty ERI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call