Abstract

Facial expression recognition (FER) under active near-infrared (NIR) illumination has the advantages of illumination invariance. In this paper, we propose a three-stream 3D convolutional neural network, named as NIRExpNet for NIR FER. The 3D structure of NIRExpNet makes it possible to extract automatically, not just spatial features, but also, temporal features. The design of multiple streams of the NIRExpNet enables it to fuse local and global facial expression features. To avoid over-fitting, the NIRExpNet has a moderate size to suit the Oulu-CASIA NIR facial expression database that is a medium-size database. Experimental results show that the proposed NIRExpNet outperforms some previous state-of-art methods, such as Histogram of Oriented Gradient to 3D (HOG 3D), Local binary patterns from three orthogonal planes (LBP-TOP), deep temporal appearance-geometry network (DTAGN), and adapt 3D Convolutional Neural Networks (3D CNN DAP).

Highlights

  • Facial expression as a carrier of emotion conveys rich behavior information [1]

  • To automatically extract temporal features and improve the recognition rate, we present a 3 dimensional convolutional neural network (3D Convolutional Neural Networks (CNNs)) structure in this research, which can extract the spatio-temporal features of facial expressions

  • Experiment results show that our proposed methods for facial expression recognition (FER) can achieve 78.42% recognition accuracy, which is higher than other recognition methods, such as Histogram of Oriented Gradient to 3D (HOG 3D) (60%), Local binary patterns from three orthogonal planes (LBP-TOP) (72.33%), deep temporal appearance-geometry network (DTAGN) (66.67%), and adapt 3D Convolutional Neural Networks (3D CNN DAP) (72.12%)

Read more

Summary

Introduction

Facial expression as a carrier of emotion conveys rich behavior information [1]. Facial expression recognition (FER) has been a hot topic, and attracted attention in many fields, including human-computer interaction [2], security [3], and biometrics [4]. FER methods focused on the still images, which did not consider the motion information of facial expression [5]. Since facial expression is a dynamic behavior, only employing still images is not sufficient for recognizing facial expressions. There are some traditional methods of extracting the facial expression dynamic features. Histogram of Oriented Gradient to 3D (3D HOG) [6], as the extension of HOG, extracts the local temporal features

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call