Abstract
Continuous detection of social interactions from wearable sensor data streams has a range of potential applications in domains, including health and social care, security, and assistive technology. We contribute an annotated, multimodal data set capturing such interactions using video, audio, GPS, and inertial sensing. We present methods for automatic detection and temporal segmentation of focused interactions using support vector machines and recurrent neural networks with features extracted from both audio and video streams. The focused interaction occurs when the co-present individuals, having the mutual focus of attention, interact by first establishing the face-to-face engagement and direct conversation. We describe an evaluation protocol, including framewise, extended framewise, and event-based measures, and provide empirical evidence that the fusion of visual face track scores with audio voice activity scores provides an effective combination. The methods, contributed data set, and protocol together provide a benchmark for the future research on this problem. The data set is available at https://doi.org/10.15132/10000134 .
Highlights
We consider automatic detection of social interactions by analysis of wearable sensor data
We report results for detecting focused interactions using more data, temporal filtering, and Long Short-Term Memory (LSTM) recurrent neural networks as well as Support Vector Machines (SVMs) using audio-only, video-only, and audio-visual features
By analysing the performance of both SVM and LSTM-Recurrent Neural Networks (RNNs) with audio, visual, and audio-visual features, we aim to obtain a deeper understanding of our application and dataset, and to provide more comprehensive benchmarking for future research
Summary
We consider automatic detection of social interactions by analysis of wearable sensor data. Focused interaction occurs when two or more co-present individuals, having mutual focus of attention, interact by establishing face-to-face engagement and direct conversation [1]. Face-to-face engagement is often not maintained throughout the entirety of a focused interaction; for example a group of people talking while in conversation will typically look at each other only intermittently. This concept of focused interaction is more specific than that of social interaction which can be considered to occur whenever individuals communicate and interact with one another whether or not they are physically co-present, e.g. by telephone [2]. Individuals in an unfocused interaction are aware of each others’ presence but establish only indirect engagement which might involve brief eye contact, or facial expressions for example
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.