Abstract

Deep learning techniques have drawn considerable interest in emotion recognition due to recent technological developments in healthcare analytics. Automatic patient emotion recognition can assist healthcare analytics by providing feedback to the stakeholders of competent healthcare about the conditions of the patients and their satisfaction levels. In this paper, we propose a novel model-level fusion technique based on deep learning for enhanced emotion recognition from multimodal signals to monitor patients in connected healthcare. The representative visual features from the video signals are extracted through the Depthwise Separable Convolution Neural Network, and the optimized temporal attributes are derived from the multiple physiological data utilizing Bi-directional Long Short-Term Memory. A soft attention method fused the high multimodal features obtained from the two data modalities to retrieve the most significant features by focusing on emotionally salient parts of the features. We exploited two face detection methods, Histogram of Oriented Gradients and Convolutional Neural Network-based face detector (ResNet-34), to observe the effects of facial features on emotion recognition. Lastly, extensive experimental evaluations have been conducted using the widely used Bio Vid Emo DB multimodal dataset to verify the performance of the proposed architecture. Experimental results show that the developed fusion architecture improved the accuracy of emotion recognition from multimodal signals and outperformed the performance of both state-of-the-art techniques and baseline methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call