Abstract
In the emotion recognition area, it is a more difficult task for recognizing the emotion data from movie or other spontaneous scenes compared to those laboratory scenes. On the base of the AFEW 6.0 database, we present a bimodal emotion recognition approach using the convolutional neural networks and feature fusion method. Firstly, we cut out the facial images and get the audio emotion data from videos respectively. Then, the convolutional neural networks method, the Gabor method and the openSMILE tool are adopted to extract the corresponding features, and three fusion methods including the principal component analysis (PCA) fusion, kernel cross-modal factor analysis (KCFA) fusion and the sparse kernel reduced-rank regression (SKRRR) fusion are utilized to integrate the forgoing facial feature and audio feature in the feature level. At last, the results on the AFEW 6.0 database show that the accuracy rate of the PCA fusion method and the SKRRR fusion method are 53.46% and 50.93% with the svm classifier respectively and are higher than the baseline of the EmotiW 2016 whose accuracy rate is 40.47%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have