Abstract
Images are an important carrier for emotional expression. Human can understand emotions in image easily and quickly, whereas it is a very challenging task for machines to extract accurate emotions. In this study, we propose a novel spatial and channel-wise attention-based emotion prediction model, SCEP, to assist computers in recognizing the emotions of images more accurately. SCEP integrates both spatial attention and channel-wise weight mechanisms into a classical convolutional neural network (CNN) layer structure to predict image emotions, on the grounds that the spatial attention mechanism can enhance the contrast between salient regions and potentially irrelevant regions, and that the channel-wise weight mechanism can emphasize informative features while suppressing less useful features. The SCEP model outputs emotion values in a continuous 2-D valence and arousal space, so that more emotions can be expressed than by simply discretely classifying emotions. To validate the effectiveness of our model, we use an existing image dataset with a widespread emotion distribution for testing. Extensive experiments show that when compared to base models (i.e. VGG and ResNet) without spatial attention or channel-wise mechanisms, SCEP can improve the accuracy of emotion prediction (evaluated by concordance correlation coefficient) by ~ 3%-5% in the arousal domain, and by ~ 3-6% in the valence domain. Therefore, we conclude that using SCEP can bring higher accuracy in emotion prediction.
Highlights
Images are one of the most important information carriers for humans in communicating with machines
We summarize the main contribution of our work as follows: 1) We propose a novel spatial and channel-wise attention-based emotion prediction model, dubbed SCEP
3) RESULTS a: TRAINING AND EVALUATION By evaluating the correlation coefficient (CCC) performance at each training epoch, Figure 6 shows that, the proposed SCEP model is capable of generalizing the training images
Summary
Images are one of the most important information carriers for humans in communicating with machines. Machines are required to understand or extract the emotions that images convey to readers, which has become essential on certain occasions. In the context of thriving social media, an increasing number of Internet users are prone to expressing their opinions or emotions by posting images online, artificial intelligence algorithms that can assist computers in image emotion prediction will help in understanding user opinions and behaviors in a more accurate way [1]. For emotion prediction, a dimensional model is often used because its performance in extracting emotions is better [3]–[5]. Valence represents pleasure which is valued from a negative to a positive level, and arousal represents excitement, ranging from calm to excited. With the implementation of a dimensional model, emotion recognition can be regarded as
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.