Visual aesthetics has always been an important area of computational vision, and researchers have continued exploring it. To further improve the performance of the image aesthetic evaluation task, we introduce a Transformer into the image aesthetic evaluation task. This paper pioneers a novel self-supervised image aesthetic evaluation model founded upon Transformers. Meanwhile, we expand the pretext task to capture rich visual representations, adding a branch for inpainting the masked images in parallel with the tasks related to aesthetic quality degradation operations. Our model’s refinement employs the innovative uncertainty weighting method, seamlessly amalgamating three distinct losses into a unified objective. On the AVA dataset, our approach surpasses the efficacy of prevailing self-supervised image aesthetic assessment methods. Remarkably, we attain results approaching those of supervised methods, even while operating with a limited dataset. On the AADB dataset, our approach improves the aesthetic binary classification accuracy by roughly 16% compared to other self-supervised image aesthetic assessment methods and improves the prediction of aesthetic attributes.
Read full abstract