Residual Attention Network vs Real Attention on Aesthetic Assessment

Ranju Mandal,Rod M Connolly,Susanne Becken,Bela Stantic

doi:10.1007/978-981-16-1685-3_26

Abstract

Photo aesthetics assessment is a challenging problem. Deep Convolutional Neural Network (CNN)-based algorithms have achieved promising results for aesthetics assessment in recent times. Lately, few efficient and effective attention-based CNN architectures are proposed that improve learning efficiency by adaptively adjusts the weight of each patch during the training process. In this paper, we investigate how real human attention affects instead of CNN-based synthetic attention network architecture in image aesthetic assessment. A dataset consists of a large number of images along with eye-tracking information has been developed using an eye-tracking device (https://www.tobii.com/group/about/this-is-eye-tracking/) power by sensor technology for our research, and it will be the first study of its kind in image aesthetic assessment. We adopted a Residual Attention Network and ResNet architectures which achieve state-of-the-art performance image recognition tasks on benchmark datasets. We report our findings on photo aesthetics assessment with two sets of datasets consist of original images and images with masked attention patches, which demonstrates higher accuracy when compared to the state-of-the-art methods.

Full Text