Abstract

Human emotions play significant role in everyday life. There are a lot of applications of automatic emotion recognition in medicine, e-learning, monitoring, marketing etc. In this paper the method and neural network architecture for real-time human emotion recognition by audio-visual data are proposed. To classify one of seven emotions, deep neural networks, namely, convolutional and recurrent neural networks are used. Visual information is represented by a sequence of 16 frames of 96 × 96 pixels, and audio information - by 140 features for each of a sequence of 37 temporal windows. To reduce the number of audio features autoencoder was used. Audio information in conjunction with visual one is shown to increase recognition accuracy up to 12%. The developed system being not demanding to be computing resources is dynamic in terms of selection of parameters, reducing or increasing the number of emotion classes, as well as the ability to easily add, accumulate and use information from other external devices for further improvement of classification accuracy.

Highlights

  • Artificial intelligence technologies are actively used in everyday life, including a wide range of software and hardware systems being able to change their behavior during operation

  • Feature extraction based on deep learning often solves the given problem more effectively than a human, especially when it comes to large amounts of data, where it is necessary to detect non-linear relationship between variables

  • To use a video channel for emotion recognition, one must select a sequence of images where a face is present in each image, and all images are of good quality

Read more

Summary

Introduction

Artificial intelligence technologies are actively used in everyday life, including a wide range of software and hardware systems being able to change their behavior during operation. Feature extraction based on deep learning often solves the given problem more effectively than a human, especially when it comes to large amounts of data, where it is necessary to detect non-linear relationship between variables. One of such examples is a recognition of faces and their individual details in images. In this paper a system based on deep neural networks has been developed It allows recognizing human emotions in real-time with limited computing resources from a video sequence in which both visual and speech information is present. The advantage of the system is its robustness, since the system was mainly trained on raw examples, and not on selected and prepared in a special way video

Related work
Proposed method
Image selection from video sequence
Collecting a dataset of blurry and clear images
Image type determination
Audio features selection
Parameter covariance
Autoencoder
The structure of neural networks
Analysis of the results
Conclusion
Findings
10. References
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.