Human emotion recognition based on the weighted integration method using image sequences and acoustic features

Sung-Woo Byun,Seok-Pil Lee

doi:10.1007/s11042-020-09842-1

Abstract

People generally perceive other people’s emotions based on speech and facial expressions, so it can be helpful to use speech signals and facial images simultaneously. However, because the characteristics of speech and image data are different, combining the two inputs is still a challenging issue in the area of emotion-recognition research. In this paper, we propose a method to recognize emotions by synchronizing speech signals and image sequences. We design three deep networks. One of the networks is trained using image sequences, which focus on facial expression changes. Facial landmarks are also input to another network to reflect facial motion. The speech signals are first converted to acoustic features, which are used for the input of the other network, synchronizing the image sequence. These three networks are combined using a novel integration method to boost the performance of emotion recognition. A test comparing accuracy is conducted to verify the proposed method. The results demonstrated that the proposed method exhibits more accurate performance than previous studies.

Highlights

High-performance personal computers have been rapidly popularized with the technological development of information society
The speech signals are first converted to acoustic features, which are used for the input of the other network, synchronizing the image sequence
The need to determine whether a given speech signal and image sequence should be classified as the acting section or the silence section arises in many emotion-recognition systems

Summary

Introduction

High-performance personal computers have been rapidly popularized with the technological development of information society. The convolutional neural network (CNN) is the most popular model among several deep-learning models It convolves input images through many filters and automatically produces a feature map. Various studies have combined facial features and the deep-learning-based model to boost the performance of facial expression recognition [24, 38, 46]. Recognize emotions from speech signals and image sequences are different, combining the two inputs is still a challenging issue in the area of emotion-recognition research. We propose a method to recognize emotions by synchronizing speech signals and image sequences. The speech signals are first converted to acoustic features, which are used for the input of the other network, synchronizing the image sequence.

Facial emotion recognition

Audio emotion recognition

Multimodal emotion recognition

Preprocessing

Image-based model

Weighted joint fine-tuning

Baselines

Feature concatenation

Joint fine-tuning

Results

Integration Method

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Multimedia Tools and Applications	Publication Date: Sep 18, 2020
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Human emotion recognition based on the weighted integration method using image sequences and acoustic features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Similar Papers

Pokerface: The Word-Emotion Detector
Alaa Khader ... Ashwini Kamath
-
Alaa Khader, et. al.Alaa Khader ... Ashwini Kamath
01 Jan 2015
01 Jan 2015

1-A-D-17. The influence of spatial frequency and autistic traits on the perception of change in facial expression: An event-related potential study
Motonobu Hidaka ... Harumitsu Murohashi
Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section | VOL. 126
Motonobu Hidaka, et. al.Motonobu Hidaka ... Harumitsu Murohashi
02 May 2015
Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section | VOL. 126

Evaluation and Validation of Emotional Expression Mimicry Tasks for Highly Sensitive Person Assessment
Yuuna Ishikami ... Hisaya Tanaka
-
Yuuna Ishikami, et. al.Yuuna Ishikami ... Hisaya Tanaka
01 Jan 2023
01 Jan 2023

Feature-point tracking by optical flow discriminates subtle differences in facial expression
J.F Cohn ... J.J Lien
-
J.F Cohn, et. al.J.F Cohn ... J.J Lien
14 Apr 1998
14 Apr 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human emotion recognition based on the weighted integration method using image sequences and acoustic features

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications