Visual Codebook Research Articles

Natural visual scenes consist of objects of various physical properties that are arranged in three-dimensional (3D) space in a variety of ways. When projected onto the retina, visual scenes entail highly structured statistics, occurring over the full range of natural variations in the world. To deal efficiently with this full range of natural variations, the visual system may have to generate percepts according to the probability distributions (PDs) of visual variables underlying any stimulus [1-5]. What, then, are these PDs in natural scenes? In this work, we proposed that these PDs are the components of the grand joint PD of the physical world and the images on the retina (GPDWI). Thus, our approach is to decompose GPDWI into a large set of PDs without information reduction. We call these PDs a visual code book. To examine this visual code book, we first sampled a large number of scene patches (~2 degrees of visual angle) from a database of high-resolution 3D natural scenes and fitted a concatenation of 8th order polynomial functions to the 2D and 3D data in the patches. Using the fitted polynomial functions, we classified all the natural scene patches into a large set of 2D-3D natural scene structures that have distinctive distributions of ranges and/or luminance. Finally, we developed a PD for each of the 2D-3D natural scene structures. Since any joint 2D-3D natural scene patch is a combination of the samples of these PDs, they form a faithful representation of GPDWI and can be used a visual code book. We used these PDs to estimate 3D scenes from 2D images and to categorize 2D-3D natural scenes. Our results showed that accurate 3D vision from a single monocular view is achievable in many situations and that near-human performance can be achieved on categorizing 2D-3D natural scenes. We thus conclude that the visual code book obtained here captures faithfully the extraordinarily complex 2D-3D natural scene statistics and supports a range of tasks of natural vision.

Read full abstract

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.

Read full abstract

Visual Codebook Research Articles

Related Topics

Articles published on Visual Codebook

A Hierarchical Word-Merging Algorithm with Class Separability Measure

Traffic sign recognition using group sparse coding

A method of protein model classification and retrieval using bag-of-visual-features.

Fast and efficient visual codebook construction for multi-label annotation using predictive clustering trees

Spatiotemporal bag-of-features for early wildfire smoke detection

Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization

Hand gesture recognition based on bag of features and support vector machine

Visual word coding based on difference maximization

Image classification using spatial pyramid robust sparse coding

Geographic Image Retrieval Using Local Invariant Features

A Generalized Probabilistic Framework for Compact Codebook Creation.

High-Order Local Spatial Context Modeling by Spatialized Random Forest

Scene classification using a multi-resolution bag-of-features model

Fast human action classification and VOI localization with enhanced sparse coding

A visual code book-structured probability distributions in natural scenes

Dynamic two-stage image retrieval from large multimedia databases

Weakly supervised codebook learning by iterative label propagation with graph quantization

No-Reference Image Quality Assessment Using Visual Codebooks

Mining Visual Collocation Patterns via Self-Supervised Subspace Learning

Task-Dependent Visual-Codebook Compression

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Visual Codebook Research Articles

Related Topics

Articles published on Visual Codebook

A Hierarchical Word-Merging Algorithm with Class Separability Measure

Traffic sign recognition using group sparse coding

A method of protein model classification and retrieval using bag-of-visual-features.

Fast and efficient visual codebook construction for multi-label annotation using predictive clustering trees

Spatiotemporal bag-of-features for early wildfire smoke detection

Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization

Hand gesture recognition based on bag of features and support vector machine

Visual word coding based on difference maximization

Image classification using spatial pyramid robust sparse coding

Geographic Image Retrieval Using Local Invariant Features

A Generalized Probabilistic Framework for Compact Codebook Creation.

High-Order Local Spatial Context Modeling by Spatialized Random Forest

Scene classification using a multi-resolution bag-of-features model

Fast human action classification and VOI localization with enhanced sparse coding

A visual code book-structured probability distributions in natural scenes

Dynamic two-stage image retrieval from large multimedia databases

Weakly supervised codebook learning by iterative label propagation with graph quantization

No-Reference Image Quality Assessment Using Visual Codebooks

Mining Visual Collocation Patterns via Self-Supervised Subspace Learning

Task-Dependent Visual-Codebook Compression