Abstract
Recent work in the computational modeling of visual attention has demonstrated that a purely bottom-up approach to identifying salient regions within an image can be successfully applied to diverse and practical problems from target recognition to the placement of advertisement. This paper proposes an application of a combination of computational models of visual attention to the image retrieval problem. We demonstrate that certain shortcomings of existing content-based image retrieval solutions can be addressed by implementing a biologically motivated, unsupervised way of grouping together images whose salient regions of interest (ROIs) are perceptually similar regardless of the visual contents of other (less relevant) parts of the image. We propose a model in which only the salient regions of an image are encoded as ROIs whose features are then compared against previously seen ROIs and assigned cluster membership accordingly. Experimental results show that the proposed approach works well for several combinations of feature extraction techniques and clustering algorithms, suggesting a promising avenue for future improvements, such as the addition of a top-down component and the inclusion of a relevance feedback mechanism.
Highlights
The dramatic growth in the amount of digital images available for consumption and the popularity of inexpensive hardware and software for acquiring, storing, and distributing images have fostered considerable research activity in the field of content-based image retrieval (CBIR) [1] during the past decade [2, 3]
Some of the open problems include the gap between the image features that can be extracted using image processing algorithms and the semantic concepts to which they may be related, the lack of widely adopted testbeds and benchmarks [8, 9], and the inflexibility and poor functionality of most existing user interfaces, to name just a few
The proposed system allows using any combination of feature extraction algorithms commonly used in CBIR, for example, color histograms, color correlograms, Tamura texture descriptors, Fourier shape descriptors, and so forth, applied on a region-byregion basis
Summary
The dramatic growth in the amount of digital images available for consumption and the popularity of inexpensive hardware and software for acquiring, storing, and distributing images have fostered considerable research activity in the field of content-based image retrieval (CBIR) [1] during the past decade [2, 3]. A content-based search engine translates this information in some way as to query the database (based on previously extracted and stored indexes) and retrieve the candidates that are more likely to satisfy the user’s request. Other approaches take into account the fact that, in many cases, users are searching for regions or objects of interest as opposed to the entire picture This has led to a number of proposed solutions that do not treat the image as a whole, but rather deal with portions (regions or blobs) within an image, such as [10, 11], or focus on objects of interest, instead [12]. To its text-based counterpart, an image retrieval system must be able to interpret the contents of the documents (images) in a collection and rank them according to a degree of relevance to the user query. The interpretation process involves extracting semantic information from the documents (images) and using this information to match the user’s needs [17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.