Abstract

Personalized Image Aesthetics Assessment (PIAA) is highly subjective, as people's aesthetic preferences vary greatly. Traditional generic models struggle to capture the unique preferences of each individual, and PIAA often deals with limited samples from individual users. Furthermore, it requires a holistic consideration of diverse visual features in images, including both local and global features. To address these challenges, we propose an innovative network that combines the power of transformer and Convolutional Neural Networks (CNNs) with Meta-Learning for PIAA (TCML-PIAA). Firstly, we leverage both Vision Transformer blocks and CNNs to extract long-term and short-term dependencies, mining richer and heterogeneous aesthetic attributes from these two branches. Secondly, to effectively fuse these distinct features, we introduce an Aesthetic Feature Interaction Module (AFIM), designed to seamlessly integrate the aesthetic features extracted from CNNs and ViT, enabling the interaction and fusion of aesthetic information across different modalities. We also incorporate a Channel-Spatial Attention Module (CSAM), embedding it within both the CNNs and the AFIM to enhance the perception of different regions in images, further exploring the aesthetic cues in images. Experimental results demonstrate that our TCML-PIAA outperforms existing state-of-the-art methods on benchmark databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call