Intelligent Scene Analysis and Recognition

Kai-Kuang Ma

doi:10.21236/ada519403

Abstract

Abstract : Knowing the name of the location and relative position towards the landmarks not only facilitates the end-user's navigation, but also provides the possibility to offer follow-up geographical services. Visual Location Recognition and Registration (VLRR) is addressed in this report, which refers to the problem of predicting the name and relative position of the location only using the captured query image. This problem is almost ill-posed, because on one hand, there is no formal definition of what constitutes a location and it is still not clear which are the location's properties that helps us to perform the recognition. On the other hand, in order to determine the relative position of the end-user, image registration of large viewpoint variation is required, which itself severely suffers from the well-known matching ambiguity. To solve the first difficulty, Bag-of-Features (BoF) model based on visual codebook is used, where the codebook is obtained by performing an unsupervised clustering on local image features extracted from the training images. Consequently, each location and query image can be efficiently represented by the corresponding histograms of the appearance of visual words in the codebook. Finally a classifier is designed to make the final decision based on the similarity of those histograms via a supervised learning. However, this BoF model lacks of being aware that different visual words actually provide different discrimination power in the sequential location classification. Therefore, a simple and novel weighting scheme, called Visual Words Aggregation Weighting (VWAW) is proposed and we assume those visual words which are cluster centers of highly aggregated local image features while with less neighboring words to be more important than others. These two assumptions are reasonable in the sense that highly aggregated cluster center usually has smaller clustering error and the visual word with less neighborhood is more discriminant and robust.

Full Text