Abstract

The paper proposes a method of visual attention-based emotion classification through eye gaze analysis. Concretely, tensor-based emotional category classification via visual attention-based heterogeneous convolutional neural network (CNN) feature fusion is proposed. Based on the relationship between human emotions and changes in visual attention with time, the proposed method performs new gaze-based image representation that is suitable for reflecting the characteristics of the changes in visual attention with time. Furthermore, since emotions evoked in humans are closely related to objects in images, our method uses a CNN model to obtain CNN features that can represent their characteristics. For improving the representation ability to the emotional categories, we extract multiple CNN features from our novel gaze-based image representation and enable their fusion by constructing a novel tensor consisting of these CNN features. Thus, this tensor construction realizes the visual attention-based heterogeneous CNN feature fusion. This is the main contribution of this paper. Finally, by applying logistic tensor regression with general tensor discriminant analysis to the newly constructed tensor, the emotional category classification becomes feasible. Since experimental results show that the proposed method enables the emotional category classification with the F1-measure of approximately 0.6, and about 10% improvement can be realized compared to comparative methods including state-of-the-art methods, the effectiveness of the proposed method is verified.

Highlights

  • Due to the increasing number of images on the Web, the demand for image understanding has increased [1,2,3]

  • We propose a new method for tensor-based emotional category classification via visual attention-based heterogeneous convolutional neural network (CNN) feature fusion

  • We show experimental results in order to verify the effectiveness of the proposed method

Read more

Summary

Introduction

Due to the increasing number of images on the Web, the demand for image understanding has increased [1,2,3]. Image understanding mainly focuses on two types of information: image-based information and human-based information. By using image-based information such as textures and luminance gradients, many researchers have tried to investigate semantic segmentation and object recognition [4,5,6,7,8,9]. By using human-based information such as brain activities and gaze movements, many researchers have tried to investigate image emotion recognition and interest level estimation [10,11,12,13,14]. We divide image understanding into image-based understanding and human-based understanding corresponding to the first and second types of information, respectively. The recent development of convolutional neural networks (CNNs) [4] has enabled the realization of image-based understanding with high performance [4,5,6,7,8,9], human-based understanding

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call