Automatic Gender and Identity Recognition in Annotated Multimodal Face-to-face Conversations

Costanza Navarretta

doi:10.1109/coginfocom.2018.8639905

Abstract

This paper addresses the automatic recognition of the gender and identity of speakers in spontaneous dyadic conversations using information about the multimodal communicative behavior of the participants. Identifying gender or individual specific behaviors in face to face communication is relevant for constructing advanced and robust interactive systems. This information also contributes to understanding how humans communicate face-to-face. In the present work, classifiers have been trained on features extracted from an annotated multimodal corpus of twelve first encounters in order to distinguish the gender and the identity of the participants. The training features comprise speech duration and shape annotations of co-speech communicative head movements, facial expressions, body postures and hand gestures of six female and six male participants. Information about the emotions shown by the participants’ facial expressions was also added to the training set. Differing from other studies addressing recognition of individuals for security systems using databases built for the purpose, the multimodal training features in this study are exclusively related to communication and the data are spontaneous occurring conversations since we study multimodal communication. A number of classifiers were trained on the data and the best results were obtained by a multilayer perceptron for gender recognition with a weighed F-score of 0.65 (accuracy 64%) and by multinomial logistic regression for the classification of 12 participants with an F-score of 0.31 (accuracy 30%). The most useful features for gender recognition were information about the emotions shown by the participants, the type of head movements and handedness, while the features which were most useful for the identification of individuals are emotions, head movements, handedness and body direction. The results on both tasks are significantly better than by chance accuracy and the results obtained by a majority classifier. This is promising since this is a first pilot study on a corpus of limited size. The features addressed in this study could in the future be combined to other biometric patterns such as those used in multimedia security systems.

Full Text