Abstract

Facial expression recognition (FER) for monitoring a driver’s emotional state has become an increasing need for advanced driver assistant systems (ADAS). Though state-of-art results of recognition accuracy have been achieved in FER with the development of deep neural networks (DNNs) in recent years, FER in real-world is still challenging due to illumination and head pose variation. In this work, we propose a multi-modal fusion based FER model capable of recognizing facial expressions accurately regardless of the lighting conditions and head poses, using a structured-light imaging camera which provides three modalities of images - RGB, Near-infrared (NIR), and Depth Maps. The model is implemented in two phases, where the first phase extracts feature from single modalities separately using 3D ResNet while the second phase combines the multi-modal features and classifies expressions. The model is trained and tested with a novel facial expression dataset with the three image modalities, with varying lighting conditions and head poses. The experimental results show that combining different modalities improves the model performance and robustness. A recognition accuracy of over 90% has been obtained in the usage scenario of FER for drivers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call