Abstract

In image classification based on bag of visual words framework, image patches used for creating image representations affect the classification performance significantly. However, currently, patches are sampled mainly based on processing low-level image information or just extracted regularly or randomly. These methods are not effective, because patches extracted through these approaches are not necessarily discriminative for image categorization. In this paper, we propose to utilize both bottom-up information through processing low-level image information and top-down information through exploring statistical properties of training image grids to extract image patches. In the proposed work, an input image is divided into regular grids, each of which is evaluated based on its bottom-up information and/or top-down information. Subsequently, every grid is assigned a saliency value based on its evaluation result, so that a saliency map can be created for the image. Finally, patch sampling from the input image is performed on the basis of the obtained saliency map. Furthermore, we propose a method to fuse these two kinds of information. The proposed methods are evaluated on both object categories and scene categories. Experiment results demonstrate their effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call