Content-aware image retargeting (CAIR) techniques are crucial in multimedia processing for displaying images on various devices while preserving visually salient contents with desirable visual effects. There are discrete and continuous algorithms. For the former, the artefacts happen when the foreground proportion is larger than the retargeting ratio; for the latter, the salient regions are prone to be squeezed. In this paper, we reformulate the retargeting process into sampling the salient signal and reconstruction under aesthetic supervision, the supervised multi-class image retargeting reconstruction (SMART) framework. The target images can be represented into complementary parts, the masked and unmasked ones, according to the saliency influences in the encoder phrase. The long-range sampling algorithm is proposed to calculate similarities through an 8-connected planar path while considering spatial distance and feature correlation. The sampled embeddings in latent space reconstruct the retargeted images under supervised signals for aesthetic quality. The semantic loss Lsem from the pretrained CLIP model can maintain consistency for both content and semantics. The supervised loss, Lir, is introduced to ensure the retargeted qualities are close to the preferred labels. Then, we release a new retargeting dataset comprising seven image classes (animal, building, car, flower, indoor, landscape and people) with supervised labels collected from designers for further aesthetic retargeting study. The ablation studies are conducted to confirm the effectiveness of the new dataset, and comparative experiments with state-of-the-art baselines demonstrate the advantages of the proposed method.
Read full abstract