Abstract

Many aesthetic models in multimedia and computer vision suffer from two shortcomings: 1) the low descriptiveness and interpretability 1 of the hand-crafted aesthetic criteria (i.e., fail to indicate region-level aesthetics) and 2) the difficulty of engineering aesthetic features adaptively and automatically toward different image sets. To remedy these problems, we develop a deep architecture to learn aesthetically relevant visual attributes from Flickr, 2 which are localized by multiple textual attributes in a weakly supervised setting. More specifically, using a bag-of-words representation of the frequent Flickr image tags, a sparsity-constrained subspace algorithm discovers a compact set of textual attributes (i.e., each textual attribute is a sparse and linear representation of those frequent image tags) for each Flickr image. Then, a weakly supervised learning algorithm projects the textual attributes at image-level to the highly-responsive image patches. These patches indicate where humans look at appealing regions with respect to each textual attribute, which are employed to learn the visual attributes. Psychological and anatomical studies have demonstrated that humans perceive visual concepts in a hierarchical way. Therefore, we normalize these patches and further feed them into a five-layer convolutional neural network to mimic the hierarchy of human perceiving the visual attributes. We apply the learned deep features onto applications like image retargeting, aesthetics ranking, and retrieval. Both subjective and objective experimental results thoroughly demonstrate the superiority of our approach.1 In this paper, "describing" and "interpretability" means the ability of seeking region-level representation of each mined textual attribute, i.e., a sparse and linear representation of those frequent image tags. 2 https://www.flickr.com/.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.