Abstract
In this paper, we study how to extract visual concepts to understand landscape scenicness. Using visual feature representations from a Convolutional Neural Network (CNN), we learn a number of Concept Activation Vectors (CAV) aligned with semantic concepts from ancillary datasets. These concepts represent objects, attributes or scene categories that describe outdoor images. We then use these CAVs to study their impact on the (crowdsourced) perception of beauty of landscapes in the United Kingdom. Finally, we deploy a technique to explore new concepts beyond those initially available in the ancillary dataset: Using a semi-supervised manifold alignment technique, we align the CNN image representation to a large set of word embeddings, therefore giving access to entire dictionaries of concepts. This allows us to obtain a list of new concept candidates to improve our understanding of the elements that contribute the most to the perception of scenicness. We do this without the need for any additional data by leveraging the commonalities in the visual and word vector spaces. Our results suggest that new and potentially useful concepts can be discovered by leveraging neighbourhood structures in the word vector spaces.
Highlights
The combination of advances in deep learning methods, in particular in the form of deep Convolutional Neural Networks (CNN), and the abundance of User Generated Content (UGC) opens up the possibility of studying how users perceive their surroundings at an unprecedented level of detail and at scale
We propose a methodology to explore a broad range of concepts that relate to landscape scenicness by using Concept Activation Vectors (CAVs) computed from a visual dataset of generic concepts
We explore the use of cross-domain manifold alignment to enlarge the concepts space with a corpus made of word embeddings
Summary
The combination of advances in deep learning methods, in particular in the form of deep Convolutional Neural Networks (CNN), and the abundance of User Generated Content (UGC) opens up the possibility of studying how users perceive their surroundings at an unprecedented level of detail and at scale. If we are interested in estimating the scenic value of a place, we would like this estimation to be independent of confunding factors such as photographer biases or specific lightning conditions Such an approach relies on a manually chosen set of concepts that are expected to relate to scenicness and an auxiliary dataset of images where the presence of these concepts is annotated. We achieve that by expanding the set of concepts available in a visual dataset of concepts (e.g., materials, textures, objects) with a series of new concepts found by comparing visual vectors (from the CNN feature space) and word embedding vectors This is in contrast to existing methods for concept discovery, where either new concepts are not assigned a semantic label [5], or a human-in-the-loop systems [6] or per image annotations [7] are used in order to learn the new concepts. The results on the ScenicOrNot dataset show that the proposed method can be used to explore which concepts are related to the task of scenicness estimation among those encoded by a dictionary of word embeddings
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.