Abstract

Numerous research works have been put forward over the years to advance the field of facial expression recognition which until today, is still considered a challenging task. The selection of image color space and the use of facial alignment as preprocessing steps may collectively pose a significant impact on the accuracy and computational cost of facial emotion recognition, which is crucial to optimize the speed-accuracy trade-off. This paper proposed a deep learning-based facial emotion recognition pipeline that can be used to predict the emotion of detected face regions in video sequences. Five well-known state-of-the-art convolutional neural network architectures are used for training the emotion classifier to identify the network architecture which gives the best speed-accuracy trade-off. Two distinct facial emotion training datasets are prepared to investigate the effect of image color space and facial alignment on the performance of facial emotion recognition. Experimental results show that training a facial expression recognition model with grayscale-aligned facial images is preferable as it offers better recognition rates with lower detection latency. The lightweight MobileNet_v1 is identified as the best-performing model with WM=0.75 and RM=160 as its hyper-parameters, achieving an overall accuracy of 86.42% on the testing video dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call