Abstract

The performance of any scene categorization system depends on the scene representation algorithm used. Lately, the Bag of Visual Words (BoVW) approach has indisputably become the method of choice for this crucial task. Nevertheless, the BoVW approach has various flaws. First, the K-means clustering algorithm for visual dictionary creation is based solely on the Euclidean distance. Second, the size of the visual vocabulary is a user-supplied parameter which is unpractical as the final categorization depends critically on the chosen number of visual words. Finally, classifying each descriptor to only one visual word is unrealistic because it does not consider the uncertainty present in the image descriptor level. Therefore, in this paper, we propose a simple solution for these problems. Our algorithm uses the Asymmetric Generalized Gaussian mixture (AGGM) to model the distribution of the visual words. Our choice is based on the fact that the Asymmetric Generalized Gaussian distribution (AGGD) can fit different shapes of observed non-Gaussian and asymmetric data. To automatically determine the number of visual words, the number of mixture components in our case, we employed the Minimum Message length (MML) criterion. We propose to use a soft assignment by exploiting the probability for each descriptor to belong to each visual word and thus considering the uncertainty present in the image descriptor level. In addition, the efficacy of the proposed algorithm is validated by applying it to scene categorization.KeywordsVisual WordScale Invariant Feature TransformScene CategorizationVisual VocabularyCodebook SizeThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call