Abstract

During spontaneous conversations the articulation process as well as the internal emotional states influence the facial configurations. Inferring the conveyed emotions from the information presented in facial expressions requires decoupling the linguistic and affective messages in the face. Normalizing and compensating for the underlying lexical content have shown improvement in recognizing facial expressions. However, this requires the transcription and phoneme alignment information, which is not available in broad range of applications. This study uses the asymmetric bilinear factorization model to perform the decoupling of linguistic and affective information when they are not given. The emotion recognition evaluations on the IEMOCAP database show the capability of the proposed approach in separating these factors in facial expressions, yielding statistically significant performance improvements. The achieved improvement is similar to the case when the ground truth phonetic transcription is known. Similarly, experiments on the SEMAINE database using image-based features demonstrate the effectiveness of the proposed technique in practical scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.