Abstract

In this paper, we study how to extract visual concepts to understand landscape scenicness. Using visual feature representations from a Convolutional Neural Network (CNN), we learn a number of Concept Activation Vectors (CAV) aligned with semantic concepts from ancillary datasets. These concepts represent objects, attributes or scene categories that describe outdoor images. We then use these CAVs to study their impact on the (crowdsourced) perception of beauty of landscapes in the United Kingdom. Finally, we deploy a technique to explore new concepts beyond those initially available in the ancillary dataset: Using a semi-supervised manifold alignment technique, we align the CNN image representation to a large set of word embeddings, therefore giving access to entire dictionaries of concepts. This allows us to obtain a list of new concept candidates to improve our understanding of the elements that contribute the most to the perception of scenicness. We do this without the need for any additional data by leveraging the commonalities in the visual and word vector spaces. Our results suggest that new and potentially useful concepts can be discovered by leveraging neighbourhood structures in the word vector spaces.

Highlights

  • The combination of advances in deep learning methods, in particular in the form of deep Convolutional Neural Networks (CNN), and the abundance of User Generated Content (UGC) opens up the possibility of studying how users perceive their surroundings at an unprecedented level of detail and at scale

  • We propose a methodology to explore a broad range of concepts that relate to landscape scenicness by using Concept Activation Vectors (CAVs) computed from a visual dataset of generic concepts

  • We explore the use of cross-domain manifold alignment to enlarge the concepts space with a corpus made of word embeddings

Read more

Summary

Introduction

The combination of advances in deep learning methods, in particular in the form of deep Convolutional Neural Networks (CNN), and the abundance of User Generated Content (UGC) opens up the possibility of studying how users perceive their surroundings at an unprecedented level of detail and at scale. If we are interested in estimating the scenic value of a place, we would like this estimation to be independent of confunding factors such as photographer biases or specific lightning conditions Such an approach relies on a manually chosen set of concepts that are expected to relate to scenicness and an auxiliary dataset of images where the presence of these concepts is annotated. We achieve that by expanding the set of concepts available in a visual dataset of concepts (e.g., materials, textures, objects) with a series of new concepts found by comparing visual vectors (from the CNN feature space) and word embedding vectors This is in contrast to existing methods for concept discovery, where either new concepts are not assigned a semantic label [5], or a human-in-the-loop systems [6] or per image annotations [7] are used in order to learn the new concepts. The results on the ScenicOrNot dataset show that the proposed method can be used to explore which concepts are related to the task of scenicness estimation among those encoded by a dictionary of word embeddings

Methodology
Concept Activation Vectors
Linking CAVs to Scenicness
Exploring New Concepts with Manifold Alignment
Datasets
Semantic Concepts
Word Embeddings
Deriving CAVs from Broden
Linking CAV Concepts to Scenicness
Discovering New Concepts with Word Embeddings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call