Abstract

Expression recognition is a very important direction for computers to understand human emotions and human-computer interaction. However, for 3D data such as video sequences, the complex structure of traditional convolutional neural networks, which stretch the input 3D data into vectors, not only leads to a dimensional explosion, but also fails to retain structural information in 3D space, simultaneously leading to an increase in computational cost and a lower accuracy rate of expression recognition. This paper proposes a video sequence face expression recognition method based on Squeeze-and-Excitation and 3DPCA Network (SE-3DPCANet). The introduction of a 3DPCA algorithm in the convolution layer directly constructs tensor convolution kernels to extract the dynamic expression features of video sequences from the spatial and temporal dimensions, without weighting the convolution kernels of adjacent frames by shared weights. Squeeze-and-Excitation Network is introduced in the feature encoding layer, to automatically learn the weights of local channel features in the tensor features, thus increasing the representation capability of the model and further improving recognition accuracy. The proposed method is validated on three video face expression datasets. Comparisons were made with other common expression recognition methods, achieving higher recognition rates while significantly reducing the time required for training.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.