Abstract

Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.

Highlights

  • Computer vision, in recent years, has witnessed outstanding and productive outcomes because of the tasks like face recognition, emotion recognition, and speech recognition

  • A fusion-based multi-modal emotion recognition system is proposed in this paper

  • The paper demonstrates a probabilistic audio-visual fusion model using SVM machine learning classifier which classifies the emotions into the classes mentioned above with better accuracy as compared to the previous works in the literature

Read more

Summary

Introduction

In recent years, has witnessed outstanding and productive outcomes because of the tasks like face recognition, emotion recognition, and speech recognition. The reason is the adaptation of high-end techniques like machine learning. Human expression recognition is still an onerous task. The first Emotion Recognition in Wild (EmotiW) (Dhall et al, 2013) challenge was held in the year 2013. There are several reasons in the past for low accuracy percentage such as there is a lack of labeled video datasets, the nature of facial expressions is ambiguous, and the effectiveness of the methods of extracting facial expression is less. In the last few years, techniques like Deep Convolutional Neural Network (DCNN) (Schmidhuber, 2015) is proven to be outstanding in extracting features from an image.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call