Cost-Effective CNNs for Real-Time Micro-Expression Recognition

Reda Belaiche,Dominique Ginhac,Fan Yang,Cyrille Migniot,Yu Liu

doi:10.3390/app10144959

Reda Belaiche, Dominique Ginhac + Show 3 more

Open Access

https://doi.org/10.3390/app10144959

Copy DOI

Journal: Applied sciences	Publication Date: Jul 19, 2020
Citations: 11	License type: CC BY 4.0

Affiliation: Université Bourgogne Franche-Comté

Abstract

Micro-Expression (ME) recognition is a hot topic in computer vision as it presents a gateway to capture and understand daily human emotions. It is nonetheless a challenging problem due to ME typically being transient (lasting less than 200 ms) and subtle. Recent advances in machine learning enable new and effective methods to be adopted for solving diverse computer vision tasks. In particular, the use of deep learning techniques on large datasets outperforms classical approaches based on classical machine learning which rely on hand-crafted features. Even though available datasets for spontaneous ME are scarce and much smaller, using off-the-shelf Convolutional Neural Networks (CNNs) still demonstrates satisfactory classification results. However, these networks are intense in terms of memory consumption and computational resources. This poses great challenges when deploying CNN-based solutions in many applications, such as driver monitoring and comprehension recognition in virtual classrooms, which demand fast and accurate recognition. As these networks were initially designed for tasks of different domains, they are over-parameterized and need to be optimized for ME recognition. In this paper, we propose a new network based on the well-known ResNet18 which we optimized for ME classification in two ways. Firstly, we reduced the depth of the network by removing residual layers. Secondly, we introduced a more compact representation of optical flow used as input to the network. We present extensive experiments and demonstrate that the proposed network obtains accuracies comparable to the state-of-the-art methods while significantly reducing the necessary memory space. Our best classification accuracy was 60.17% on the challenging composite dataset containing five objectives classes. Our method takes only 24.6 ms for classifying a ME video clip (less than the occurrence time of the shortest ME which lasts 40 ms). Our CNN design is suitable for real-time embedded applications with limited memory and computing resources.

Highlights

Emotion recognition has received much attention in the research community in recent years.Among the several sub-fields of emotion analysis, studies of facial expression recognition are active [1,2,3,4]
Ekman developed the Facial Action Coding System (FACS) to describe the facial muscle movements according to the action units, i.e., the fundamental actions of individual muscles or groups of muscles that can be combined to represent each of the facial expressions
Inspired by existing works [27,29], we explored different Convolutional Neural Networks (CNNs) architectures and several optical flow representations for CNN inputs to find cost-effective neural network architectures that were capable of recognizing MEs in real-time

Summary

Introduction

Among the several sub-fields of emotion analysis, studies of facial expression recognition are active [1,2,3,4]. Ekman developed the Facial Action Coding System (FACS) to describe the facial muscle movements according to the action units , i.e., the fundamental actions of individual muscles or groups of muscles that can be combined to represent each of the facial expressions. These facial expressions can be labeled by codes based on the observed facial movements rather than from subjective classifications of emotion.

Objectives

Methods

Results

Conclusion