NIRExpNet: Three-Stream 3D Convolutional Neural Network for Near Infrared Facial Expression Recognition

Zhan Wu,Ying Chen,Guangyuan Liu,Zhihao Zhang,Tong Chen

doi:10.3390/app7111184

Zhan Wu, Ying Chen + Show 3 more

Open Access

https://doi.org/10.3390/app7111184

Copy DOI

Journal: Applied sciences	Publication Date: Nov 17, 2017
Citations: 10	License type: CC BY 4.0

Affiliation: Southwest University

Abstract

Facial expression recognition (FER) under active near-infrared (NIR) illumination has the advantages of illumination invariance. In this paper, we propose a three-stream 3D convolutional neural network, named as NIRExpNet for NIR FER. The 3D structure of NIRExpNet makes it possible to extract automatically, not just spatial features, but also, temporal features. The design of multiple streams of the NIRExpNet enables it to fuse local and global facial expression features. To avoid over-fitting, the NIRExpNet has a moderate size to suit the Oulu-CASIA NIR facial expression database that is a medium-size database. Experimental results show that the proposed NIRExpNet outperforms some previous state-of-art methods, such as Histogram of Oriented Gradient to 3D (HOG 3D), Local binary patterns from three orthogonal planes (LBP-TOP), deep temporal appearance-geometry network (DTAGN), and adapt 3D Convolutional Neural Networks (3D CNN DAP).

Highlights

Facial expression as a carrier of emotion conveys rich behavior information [1]
To automatically extract temporal features and improve the recognition rate, we present a 3 dimensional convolutional neural network (3D Convolutional Neural Networks (CNNs)) structure in this research, which can extract the spatio-temporal features of facial expressions
Experiment results show that our proposed methods for facial expression recognition (FER) can achieve 78.42% recognition accuracy, which is higher than other recognition methods, such as Histogram of Oriented Gradient to 3D (HOG 3D) (60%), Local binary patterns from three orthogonal planes (LBP-TOP) (72.33%), deep temporal appearance-geometry network (DTAGN) (66.67%), and adapt 3D Convolutional Neural Networks (3D CNN DAP) (72.12%)

Summary

Introduction

Facial expression as a carrier of emotion conveys rich behavior information [1]. Facial expression recognition (FER) has been a hot topic, and attracted attention in many fields, including human-computer interaction [2], security [3], and biometrics [4]. FER methods focused on the still images, which did not consider the motion information of facial expression [5]. Since facial expression is a dynamic behavior, only employing still images is not sufficient for recognizing facial expressions. There are some traditional methods of extracting the facial expression dynamic features. Histogram of Oriented Gradient to 3D (3D HOG) [6], as the extension of HOG, extracts the local temporal features

Methods

Results

Conclusion