Abstract

This work concludes the first study on mouth-based emotion recognition while adopting a transfer learning approach. Transfer learning results are paramount for mouth-based emotion emotion recognition, because few datasets are available, and most of them include emotional expressions simulated by actors, instead of adopting real-world categorisation. Using transfer learning, we can use fewer training data than training a whole network from scratch, and thus more efficiently fine-tune the network with emotional data and improve the convolutional neural network’s performance accuracy in the desired domain. The proposed approach aims at improving emotion recognition dynamically, taking into account not only new scenarios but also modified situations to the initial training phase, because the image of the mouth can be available even when the whole face is visible only in an unfavourable perspective. Typical applications include automated supervision of bedridden critical patients in a healthcare management environment, and portable applications supporting disabled users having difficulties in seeing or recognising facial emotions. This achievement takes advantage of previous preliminary works on mouth-based emotion recognition using deep-learning, and has the further benefit of having been tested and compared to a set of other networks using an extensive dataset for face-based emotion recognition, well known in the literature. The accuracy of mouth-based emotion recognition was also compared to the corresponding full-face emotion recognition; we found that the loss in accuracy is mostly compensated by consistent performance in the visual emotion recognition domain. We can, therefore, state that our method proves the importance of mouth detection in the complex process of emotion recognition.

Highlights

  • Visual emotion recognition (ER) has been widely studied as one of the first affective computing techniques, based on visual features of the face, to combine features about the eyes, mouth and various facial elements at the same time

  • The mouth extraction from the images of the raw dataset was carried out using as a pre-trained neural network the shape_predictor_68_face_landmarks.dat convolutional neural networks (CNNs) [2,8,27], which produced in output 68 landmarks, detected for each image

  • Once we obtained the landmarks of a face, we used those identifying the area of the mouth and cropped the image

Read more

Summary

Introduction

Visual emotion recognition (ER) has been widely studied as one of the first affective computing techniques, based on visual features of the face, to combine features about the eyes, mouth and various facial elements at the same time. Studies using only the mouth for facial emotion recognition obtained promising results, while still not gaining the proper recognition among the state-of-the-art. Such works used convolutional neural networks (CNNs) to detect basic emotions from innovative and ubiquitous devices, e.g., smartphone or computer cameras, to produce textual, audio or visual feedback for humans, or digital outputs to support other services, mainly for healthcare systems [4]. A neural network can obtain an excellent result with a relatively small dataset of images when trained on a single individual, e.g., to detect particular states needing immediate medical intervention, or changes over time indicating an underlying degenerative health condition.

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.