Learning the structure of image collections with latent aspect models

Florent Monay

doi:10.5075/epfl-thesis-3729

Abstract

The approach to indexing an image collection depends on the type of data to organize. Satellite images are likely to be searched with latitude and longitude coordinates, medical images are often searched with an image example that serves as a visual query, and personal image collections are generally browsed by event. A more general retrieval scenario is based on the use of textual keywords to search for images containing a specific object, or representing a given scene type. This requires the manual annotation of each image in the collection to allow for the retrieval of relevant visual information based on a text query. This time-consuming and subjective process is the current price to pay for a reliable and convenient text-based image search. This dissertation investigates the use of probabilistic models to assist the automatic organization of image collections, attempting to link the visual content of digital images with a potential textual description. Relying on robust, patch-based image representations that have proven to capture a variety of visual content, our work proposes to model images as mixtures of latent aspects. These latent aspects are defined by multinomial distributions that capture patch co-occurrence information observed in the collection. An image is not represented by the direct count of its constituting elements, but as a mixture of latent aspects that can be estimated with principled, generative unsupervised learning methods. An aspect-based image representation therefore incorporates contextual information from the whole collection that can be exploited. This emerging concept is explored for several fundamental tasks related to image retrieval - namely classification, clustering, segmentation, and annotation - in what represents one of the first coherent and comprehensive study of the subject. We first investigate the possibility of classifying images based on their estimated aspect mixture weights, interpreting latent aspect modeling as an unsupervised feature extraction process. Several image categorization tasks are considered, where images are classified based on the present objects or according to their global scene type. We demonstrate that the concept of latent aspects allows to take advantage of non-labeled data to infer a robust image representation that achieves a higher classification performance than the original patch-based representation. Secondly, further exploring the concept, we show that aspects can correspond to an interesting soft clustering of an image collection that can serve as a browsing structure. Images can be ranked given an aspect, illustrating the corresponding co-occurrence context visually. In the third place, we derive a principled method that relies on latent aspects to classify image patches into different categories. This produces an image segmentation based on the resulting spatial class-densities. We finally propose to model images and their caption with a single aspect model, merging the co-occurrence contexts of the visual and the textual modalities in different ways. Once a model has been learned, the distribution of words given an unseen image is inferred based on its visual representation, and serves as textual indexing. Overall, we demonstrate with extensive experiments that the co-occurrence context captured by latent aspects is suitable for the above mentioned tasks, making it a promising approach for multimedia indexing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning the structure of image collections with latent aspect models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Fine-Grained Latent Aspects Model for Recommendation: Combining Each Rating with Its Associated Review
Xuehui Mao ... Daming Wei
-
Xuehui Mao, et. al.Xuehui Mao ... Daming Wei
01 Jan 2017
01 Jan 2017

Probing life scripts for important life events in a multi-ethnic society

-

19 Apr 2017
19 Apr 2017

The detection of differentially expressed gene and atlas construction of pre-mRNA alternative splicing during seed germination of Arabidopsis thaliana
...
Chinese Science Bulletin | VOL. 64
, et. al. ...
22 Oct 2019
Chinese Science Bulletin | VOL. 64

Interactive Visual Analysis of COVID-19 Epidemic Situation Using Geographic Knowledge Graph
...
-
, et. al. ...
05 Jun 2020
05 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning the structure of image collections with latent aspect models

Abstract

Talk to us

Similar Papers