SCEP—A New Image Dimensional Emotion Recognition Model Based on Spatial and Channel-Wise Attention Mechanisms

Bo Li,Feng Feng,Libiao Jin,Hui Ren,Xuekun Jiang,Fang Miao

doi:10.1109/access.2021.3057373

Bo Li, Feng Feng + Show 4 more

Open Access

PDF Available

https://doi.org/10.1109/access.2021.3057373

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Images are an important carrier for emotional expression. Human can understand emotions in image easily and quickly, whereas it is a very challenging task for machines to extract accurate emotions. In this study, we propose a novel spatial and channel-wise attention-based emotion prediction model, SCEP, to assist computers in recognizing the emotions of images more accurately. SCEP integrates both spatial attention and channel-wise weight mechanisms into a classical convolutional neural network (CNN) layer structure to predict image emotions, on the grounds that the spatial attention mechanism can enhance the contrast between salient regions and potentially irrelevant regions, and that the channel-wise weight mechanism can emphasize informative features while suppressing less useful features. The SCEP model outputs emotion values in a continuous 2-D valence and arousal space, so that more emotions can be expressed than by simply discretely classifying emotions. To validate the effectiveness of our model, we use an existing image dataset with a widespread emotion distribution for testing. Extensive experiments show that when compared to base models (i.e. VGG and ResNet) without spatial attention or channel-wise mechanisms, SCEP can improve the accuracy of emotion prediction (evaluated by concordance correlation coefficient) by ~ 3%-5% in the arousal domain, and by ~ 3-6% in the valence domain. Therefore, we conclude that using SCEP can bring higher accuracy in emotion prediction.

Highlights

Images are one of the most important information carriers for humans in communicating with machines
We summarize the main contribution of our work as follows: 1) We propose a novel spatial and channel-wise attention-based emotion prediction model, dubbed SCEP
3) RESULTS a: TRAINING AND EVALUATION By evaluating the correlation coefficient (CCC) performance at each training epoch, Figure 6 shows that, the proposed SCEP model is capable of generalizing the training images

Summary

Introduction

Images are one of the most important information carriers for humans in communicating with machines. Machines are required to understand or extract the emotions that images convey to readers, which has become essential on certain occasions. In the context of thriving social media, an increasing number of Internet users are prone to expressing their opinions or emotions by posting images online, artificial intelligence algorithms that can assist computers in image emotion prediction will help in understanding user opinions and behaviors in a more accurate way [1]. For emotion prediction, a dimensional model is often used because its performance in extracting emotions is better [3]–[5]. Valence represents pleasure which is valued from a negative to a positive level, and arousal represents excitement, ranging from calm to excited. With the implementation of a dimensional model, emotion recognition can be regarded as

Methods

Results

Conclusion