Understanding how deep neural networks learn face expressions

Nima Mousavi,Stefan Wermter,Pablo Barros,Bruno Fernandes,Henrique Siqueira

doi:10.1109/ijcnn.2016.7727203

Abstract

Deep neural networks have been used successfully for several different computer vision-related tasks, including facial expression recognition. In spite of the good results, it is still not clear why these networks achieve such good recognition rates. One way to learn more about deep neural networks is to visualise and understand what they are learning, and to do so techniques such as deconvolution could play a significant role. In this paper, we train a Convolutional Neural Network (CNN) and Lateral Inhibition Pyramidal Neural Network (LIPNet) to learn facial expressions. Then, we use the deconvolution process to visualise the learned features of the CNN and we introduce a novel mechanism for visualising the internal representation of the LIPNet. We perform a series of experiments, training our networks with the Cohn-Kanade data set and show what kind of facial structures compose the learned emotion expression representation. Then, we use the trained networks to recognise images from the Jaffe data set and demonstrate that the learned representations are present in different face images, emphasizing the generalization aspects of these networks. We discuss the different representations that each network learns and how they differ from each other. We also discuss how each learned representation contributes to the recognition process and how they can be compared to the emotional notation Facial Action Coding System - Facs. Finally, we explain how the principles of invariance, redundancy and filtering, common for deep networks, contribute to the learned features and to the facial expression recognition task in general.

Full Text