Abstract

Deep neural networks have proven to be efficient systems for learning complex data representations. However, one of their main constraints is their inability to deal with changes in the data distribution. For instance, in real-time facial expression recognition, the data used to evaluate a model commonly differs in quality compared to that used to train the model, leading to poor generalization performance. In this work we propose a novel Deep Convolutional Neural Network (CNN) architecture pre-trained as a Stacked Convolutional Autoencoder (SCAE) to address emotion recognition in unconstrained environments. The SCAE is trained in a greedy layer-wise unsupervised fashion, and combines convolutional and fully connected layers and learns to encode facial expression images as an illumination and facial pose invariant feature vector. The CNN offers state-of-the-art classification rate of 99.52% on a combined corpus of gamma corrected version of the CK+, JAFFE, FEEDTUM and KDEF datasets. When evaluated on unseen data obtained in unconstrained environments, our approach achieves 79.75%, an increase of over 28% compared to a CNN without our pre-training approach, supporting the methodology proposed in this work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.