Visual affective classification by combining visual and text features.

Ningning Liu,Liming Chen,Boyang Gao,Emmanuel Dellandréa,Kai Wang,Xin Jin,Christos Papadelis

doi:10.1371/journal.pone.0183018

Abstract

Affective analysis of images in social networks has drawn much attention, and the texts surrounding images are proven to provide valuable semantic meanings about image content, which can hardly be represented by low-level visual features. In this paper, we propose a novel approach for visual affective classification (VAC) task. This approach combines visual representations along with novel text features through a fusion scheme based on Dempster-Shafer (D-S) Evidence Theory. Specifically, we not only investigate different types of visual features and fusion methods for VAC, but also propose textual features to effectively capture emotional semantics from the short text associated to images based on word similarity. Experiments are conducted on three public available databases: the International Affective Picture System (IAPS), the Artistic Photos and the MirFlickr Affect set. The results demonstrate that the proposed approach combining visual and textual features provides promising results for VAC task.

Highlights

Visual object classification (VOC) targets on classification of objects in images at the cognitive level
We propose the emotional Histogram of Textual Concepts (eHTC) for visual affective classification (VAC), which is to calculate a histogram of textual concepts towards an emotional dictionary, and each bin is the contribution of each word toward the underlying concept according to a predefined semantic similarity measurement
The images can help to text-based analysis, e.g. sentiment analysis [84, 85], and the texts can improve imagebased classification, such as VOC task [86,87,88] These works show that the multimodal approaches can combine the preponderance and complementary information of each sources, and achieve better classification results than single modality, which is confirmed by our results on VAC task

Summary

Introduction

Visual object classification (VOC) targets on classification of objects in images at the cognitive level. Visual affective classification (VAC) aims at identifying the emotions that are expected to arise in image reviewers at the affective level, which proves to be extremely challenging due to the semantic gap between the low level visual features and the high level emotion-related concepts [1, 2]. The reason is that people from different backgrounds or cultures might perceive the same visual content quite differently. Recent works on affective computing [3, 6,7,8,9] argue that certain features in images, as a universal validity to classify images in terms of affective concept, are believed to evoke some human feelings more and have certain stability and generality across different people and different cultures

Methods

Findings

Discussion

Conclusion