Bag Of Visual Words Method Research Articles

Video understanding is an important goal of several computer vision problems. To achieve this goal, a video is decomposed into a set of key components and the interactions between the components are modeled. Human action recognition is a challenging example of video understanding in the field of computer vision. Modeling a vocabulary of local image features in a bag of visual words (BoW) is a common approach to extract the components of an action video. Since in a video recognition task, there is no direct mapping from the raw features to class label, higher level visual descriptors and indeed, more accurate dictionaries are required. Therefore, in order to extract intrinsic shape bases and to consider temporal structure of an action, in this paper we take the advantages of group sparse coding methods. In our proposed BoW method each video is represented as a histogram of the coefficients obtained from group sparse coding. The main contribution of this study is to explore the geometry of action components via structured sparse coefficients of visual words in a real-time manner. In comparison with the conventional BoW models, our proposed approach has other advantages including much less quantization error, higher level feature representation which leads to reduction in model parameters and memory complexity while considering temporal structure. We evaluate our method on standard human action datasets including KTH, Weismann, UCF-sports and UCF50 human action datasets. The experimental results are significantly improved in comparison with previously presented results methods.

Previous works about spatial information incorporation into a traditional bag-of-visual-words (BOVW) model mainly consider the spatial arrangement of an image, ignoring the rich textural information in land-use remote-sensing images. Hence, this article presents a 2-D wavelet decomposition (WD)-based BOVW model for land-use scene classification, since the 2-D wavelet decomposition method does well not only in textural feature extraction, but also in the multi-resolution representation of an image, which is favourable for the use of both spatial arrangement and textural information in land-use images. The proposed method exploits the textural structures of an image with colour information transformed into greyscale. Moreover, it works first by decomposing the greyscale image into different sub-images using 2-D discrete wavelet transform (DWT) and then by extracting local features of the greyscale image and all the decomposed images with dense regions in which a given image is evenly sampled by a regular grid with a specified grid space. After that, the method generates the corresponding visual vocabularies and computes histograms of visual word occurrences of local features found in each former image. Specifically, the soft-assignment or multi-assignment (MA) technique is employed, accounting for the impact of clustering on visual vocabulary creation that two similar image patches may be clustered into different clusters when increasing the size of visual vocabulary. The proposed method is evaluated on a ground truth image dataset of 21 land-use classes manually extracted from high-resolution remote-sensing images. Experimental results demonstrate that the proposed method significantly outperforms previous methods, such as the traditional BOVW model, the spatial pyramid representation-based BOVW method, the multi-resolution representation-based BOVW method, and so on, and even exceeds the best result obtained from the creator of the land-use dataset. Therefore, the proposed approach is very suitable for land-use scene classification tasks.

Bag Of Visual Words Method Research Articles

Related Topics

Articles published on Bag Of Visual Words Method

Structured sparse representation for human action recognition

Human-Centric Image Categorization Based on Poselets

Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model

Content-based image retrieval using spatial layout information in brain tumor T1-weighted contrast-enhanced MR images.

A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification

A visual word weighting scheme based on emerging itemsets for video annotation

Video annotation based on adaptive annular spatial partition scheme

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bag Of Visual Words Method Research Articles

Related Topics

Articles published on Bag Of Visual Words Method

Structured sparse representation for human action recognition

Human-Centric Image Categorization Based on Poselets

Land-Use Scene Classification Using a Concentric Circle-Structured Multiscale Bag-of-Visual-Words Model

Content-based image retrieval using spatial layout information in brain tumor T1-weighted contrast-enhanced MR images.

A 2-D wavelet decomposition-based bag-of-visual-words model for land-use scene classification

A visual word weighting scheme based on emerging itemsets for video annotation

Video annotation based on adaptive annular spatial partition scheme