Visual complexity analysis using deep intermediate-layer features

Elham Saraee,Mona Jalal,Margrit Betke

doi:10.1016/j.cviu.2020.102949

Elham Saraee, Mona Jalal + Show 1 more

Open Access

https://doi.org/10.1016/j.cviu.2020.102949

Copy DOI

Journal: Computer Vision and Image Understanding	Publication Date: Apr 2, 2020
Citations: 36	License type: cc-by

Affiliation: Boston University

Abstract

In this paper, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. We explore unsupervised information extraction from intermediate convolutional layers of deep neural networks to measure visual complexity. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. To show the effectiveness of our proposed metric for various applications, we introduce Savoias, a visual complexity dataset that compromises of more than 1,400 images from seven diverse image categories (e.g., advertisement and interior design). We demonstrate high correlations of our deep neural network-based measure of visual complexity with human-curated ground-truth (GT) scores on various widely used network architectures, e.g., VGG16, ResNet-v2-152, and EfficientNet, and in networks trained on two classification tasks (object and scene classification). This result reveals that intermediate convolutional layers of deep neural networks carry information about the complexity of images that is meaningful to people. Furthermore, we show that our method of measuring visual complexity outperforms traditional methods on Savoias and two other state-of-the-art benchmark datasets. Moreover, we perform extensive analysis on the performance difference between our unsupervised method and a supervised method trained on the feature map, and show that by supervision, we can improve the prediction. Finally, we demonstrate that, within the context of a category, visually more complex images are also more memorable to human observers.

Full Text