Abstract

Attending selectively to emotion-eliciting stimuli is intrinsic to human vision. In this research, we investigate how emotion-elicitation features of images relate to human selective attention. We create the EMOtional attention dataset (EMOd). It is a set of diverse emotion-eliciting images, each with (1) eye-tracking data from 16 subjects, (2) image context labels at both object- and scene-level. Based on analyses of human perceptions of EMOd, we report an emotion prioritization effect: emotion-eliciting content draws stronger and earlier human attention than neutral content, but this advantage diminishes dramatically after initial fixation. We find that human attention is more focused on awe eliciting and aesthetic vehicle and animal scenes in EMOd. Aiming to model the above human attention behavior computationally, we design a deep neural network (CASNet II), which includes a channel weighting subnetwork that prioritizes emotion-eliciting objects, and an Atrous Spatial Pyramid Pooling (ASPP) structure that learns the relative importance of image regions at multiple scales. Visualizations and quantitative analyses demonstrate the model's ability to simulate human attention behavior, especially on emotion-eliciting content.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call