Abstract

In recent years, research on facial expression recognition (FER) under mask is trending. Wearing a mask for protection from Covid 19 has become a compulsion and it hides the facial expressions that is why FER under the mask is a difficult task. The prevailing unimodal techniques for facial recognition are not up to the mark in terms of good results for the masked face, however, a multimodal technique can be employed to generate better results. We proposed a multimodal methodology based on deep learning for facial recognition under a masked face using facial and vocal expressions. The multimodal has been trained on a facial and vocal dataset. We have used two standard datasets, M-LFW for the masked dataset and CREMA-D and TESS dataset for vocal expressions. The vocal expressions are in the form of audio while the faces data is in image form that is why the data is heterogenous. In order to make the data homogeneous, the voice data is converted into images by taking spectrogram. A spectrogram embeds important features of the voice and it converts the audio format into the images. Later, the dataset is passed to the multimodal for training. neural network and the experimental results demonstrate that the proposed multimodal algorithm outsets unimodal methods and other state-of-the-art deep neural network models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.