Abstract

The exponential growth of image data has created a compelling need for innovative tools for managing, retrieving, and visualizing images from large collection. The low storage cost of computer hardware, availability of digital devices, high bandwidth communication facilities and rapid growth of imaging in the World Wide Web has made all these possible. Many applications such as digital libraries, image search engines, medical decision support systems require effective and efficient image retrieval techniques to access the images based on their contents, commonly known as content-based image retrieval (CBIR). CBIR computes relevance of query and database images based on the visual similarity of low-level features (e.g., color, texture, shape, edge, etc.) derived entirely from the images Smeulders et al. (2000); Liua et al. (2007); Datta et al. (2008). Even after almost two decades of intensive research, the CBIR systems still lag behind the best text-based search engines of today, such as Google and Yahoo. The main problem here is the extent of mismatch between user’s requirements as high-level concepts and the low-level representation of images; this is the well known “semantic gap” problem Smeulders et al. (2000). In an effort to minimize the “semantic gap”, some recent approaches have used machine learning on locally computed image features in a “bag of concepts” based image representation scheme by treating them as visual concepts Liua et al. (2007). The models are applied to images by using a visual analogue of a word (e.g., “bag of words” ) in text documents by automatically extracting different predominant color or texture patches or semantic patches, such as, water, sand, sky, cloud, etc. in natural photographic images. This intermediary semantic level representation is introduced as a first step to deal with the semantic gap between low-level features and high-level concepts. Recent works have shown that local features represented by “bags-of-words” are suitable for scene classification showing impressive levels of performance Zhu et al. (2002); Lim (2002); Jing et al. (2004); Vogel & Schiele (2007); Shi et al. (2004); Rahman et al. (2009a). For example, a framework to generate automatically the visual terms (“keyblock”) is proposed in Zhu et al. (2002) by applying a vector quantization or clustering technique. It represents images similar to the “bags-of-words” based representation in a correlation-enhanced feature space. For the reliable identification of image elements, the work in Lim (2002) manually identifies the visual patches (“visual keywords”) from the sample images. n Jing et al. (2004), a compact and sparse representation of images is proposed based on the utilization of a region codebook generated by a clustering technique. A semantic 10

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call