Abstract

Eye contact detection in group conversations is the key to developing artificial mediators that can understand and interact with a group. In this paper, we propose to model a group's appearances and behavioral features to perform eye contact detection for each participant in the conversation. Specifically, we extract the participants' appearance features at the detection moment, and extract the participants' behavioral features based on their motion history image, which is encoded with the participants' body movements within a small time window before the detection moment. In order to attain powerful representative features from these images, we propose to train a Convolutional Neural Network (CNN) to model them. A set of relevant features are obtained from the network, which achieves an accuracy of 0.60 on the validation set in the eye contact detection challenge in ACM MM 2021. Furthermore, our experimental results also demonstrate that making use of both participants' appearance and behavior features can lead to higher accuracy at eye detection than only using one of them.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call