Automatic Emotion Recognition Using Temporal Multimodal Deep Learning

Bahareh Nakisa,Mohammad Naim Rastgoo,Frederic Maire,Andry Rakotonirainy,Vinod Chandran

doi:10.1109/access.2020.3027026

Bahareh Nakisa, Mohammad Naim Rastgoo + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3027026

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Emotion recognition using miniaturised wearable physiological sensors has emerged as a revolutionary technology in various applications. However, detecting emotions using the fusion of multiple physiological signals remains a complex and challenging task. When fusing physiological signals, it is essential to consider the ability of different fusion approaches to capture the emotional information contained within and across modalities. Moreover, since physiological signals consist of time-series data, it becomes imperative to consider their temporal structures in the fusion process. In this study, we propose a temporal multimodal fusion approach with a deep learning model to capture the non-linear emotional correlation within and across electroencephalography (EEG) and blood volume pulse (BVP) signals and to improve the performance of emotion classification. The performance of the proposed model is evaluated using two different fusion approaches – early fusion and late fusion. Specifically, we use a convolutional neural network (ConvNet) long short-term memory (LSTM) model to fuse the EEG and BVP signals to jointly learn and explore the highly correlated representation of emotions across modalities, after learning each modality with a single deep network. The performance of the temporal multimodal deep learning model is validated on our dataset collected from smart wearable sensors and is also compared with results of recent studies. The experimental results show that the temporal multimodal deep learning models, based on early and late fusion approaches, successfully classified human emotions into one of four quadrants of dimensional emotions with an accuracy of 71.61% and 70.17%, respectively.

Highlights

Automated emotion recognition using lightweight body sensors and advanced machine learning technologies has been used in different application domains such as computer games [1], e-health [2], [3] and road safety [4]
We evaluated the temporal multimodal deep learning models and compared them with multimodal learning models based on a trialwise strategy and handcrafted feature extraction methods
It was shown that the performance of the temporal multimodal deep learning models using early and late fusion was higher than that of the multimodal learning models based on a non-temporal strategy, with recorded accuracies of 71.61± 2.71 and 70.17± 3.7 versus 55.07± 4.3 and 52.28± 4.6, respectively

Summary

Introduction

Automated emotion recognition using lightweight body sensors and advanced machine learning technologies has been used in different application domains such as computer games [1], e-health [2], [3] and road safety [4]. Lightweight wireless sensors in headbands and smart watches can be used by individuals as they carry on their daily life activities. These sensors can record physiological signals like blood volume pulse (BVP), electroencephalograms (EEG), skin temperature and skin conductance in a minimally invasive manner. Among the various physiological signals available, EEG and BVP have been found to be useful in inferring emotional states. Inner emotional states can affect the body’s physiological signals such as EEG and BVP signals which originate from these two components [5], [6]. The accuracy of BVP is lower than that of electrocardiograms (ECGs), due to its simplicity BVP is widely used in biosensors developed for applications like office workers’ mental workload prediction [20]

Objectives

Results

Conclusion