Visual Codebook Research Articles

Inspired by the success of textual words in large-scale textual information processing, researchers are trying to extract visual words from images which function similar as textual words. Visual words are commonly generated by clustering a large amount of image local features and the cluster centers are taken as visual words. This approach is simple and scalable, but results in noisy visual words. Lots of works are reported trying to improve the descriptive and discriminative ability of visual words. This paper gives a comprehensive survey on visual vocabulary and details several state-of-the-art algorithms. A comprehensive review and summarization of the related works on visual vocabulary is first presented. Then, we introduce our recent algorithms on descriptive and discriminative visual word generation, i.e., latent visual context analysis for descriptive visual word identification [74], descriptive visual words and visual phrases generation [68], contextual visual vocabulary which combines both semantic contexts and spatial contexts [69], and visual vocabulary hierarchy optimization [18]. Additionally, we introduce two interesting post processing strategies to further improve the performance of visual vocabulary, i.e., spatial coding [73] is proposed to efficiently remove the mismatched visual words between images for more reasonable image similarity computation; user preference based visual word weighting [44] is developed to make the image similarity computed based on visual words more consistent with users' preferences or habits.

Read full abstract

The extraction and quantization of local image and video descriptors for the subsequent creation of visual codebooks is a technique that has proved very effective for image and video retrieval applications. In this paper we build on this concept and propose a new set of visual descriptors that provide a local space-time description of the visual activity. The proposed descriptors are extracted at spatiotemporal salient points detected on the estimated optical flow field for a given image sequence and are based on geometrical properties of three-dimensional piecewise polynomials, namely B-splines. The latter are fitted on the spatiotemporal locations of salient points that fall within a given spatiotemporal neighborhood. Our descriptors are invariant in translation and scaling in space-time. The latter is ensured by coupling the neighborhood dimensions to the scale at which the corresponding spatiotemporal salient points are detected. In addition, in order to provide robustness against camera motion (e.g. global translation due to camera panning) we subtract the motion component that is estimated by applying local median filters on the optical flow field. The descriptors that are extracted across the whole dataset are clustered in order to create a codebook of ‘visual verbs’, where each verb corresponds to a cluster center. We use the resulting codebook in a ‘bag of verbs’ approach in order to represent the motion of the subjects within small temporal windows. Finally, we use a boosting algorithm in order to select the most discriminative temporal windows of each class and Relevance Vector Machines (RVM) for classification. The presented results using three different databases of human actions verify the effectiveness of our method.

Read full abstract

Visual Codebook Research Articles

Related Topics

Articles published on Visual Codebook

A Review of Codebook Models in Patch-Based Visual Object Recognition

Improvements in image categorization using codebook ensembles

Creating Efficient Visual Codebook Ensembles for Object Categorization

Efficient and Effective Visual Codebook Generation Using Additive Kernels

An Adaptive Algorithm for Robust Visual Codebook Generation and Its Natural Scene Categorization Application

Building descriptive and discriminative visual codebook for large-scale image applications

Latent visual context learning for web image applications

Visual codebook construction for class-specific recognition

Sparse B-spline polynomial descriptors for human activity recognition

Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Visual Codebook Research Articles

Related Topics

Articles published on Visual Codebook

A Review of Codebook Models in Patch-Based Visual Object Recognition

Improvements in image categorization using codebook ensembles

Creating Efficient Visual Codebook Ensembles for Object Categorization

Efficient and Effective Visual Codebook Generation Using Additive Kernels

An Adaptive Algorithm for Robust Visual Codebook Generation and Its Natural Scene Categorization Application

Building descriptive and discriminative visual codebook for large-scale image applications

Latent visual context learning for web image applications

Visual codebook construction for class-specific recognition

Sparse B-spline polynomial descriptors for human activity recognition

Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval