Abstract
Content based image retrieval represents images as N- dimensional feature vectors. The k images most similar to a target image, i.e., those closest to its feature vector, are determined by applying a k-nearest-neighbor (k-NN) query. A sequential scan of the feature vectors for k-NN queries is costly for a large number of images when N is high. The search space can be reduced by indexing the data, but the effectiveness of multidimensional indices is poor for high dimensional data. Building indices on dimensionality reduced data is one method to improve indexing efficiency. We utilize the Singular Value Decomposition (SVD) method to attain dimensionality reduction (DR) with minimum information loss for static data. Clustered SVD (CSVD) combines clustering with SVD to attain a lower normalized mean square error (NMSE) by taking advantage of the fact that most real-world datasets exhibit local rather than global correlations. The Local Dimensionality Reduction (LDR) method differs from CSVD in that it uses an SVD-friendly clustering method, rather than the k-means clustering method. We propose a hybrid method which combines the clustering method of LDR with the DR method of CSVD, so that the vector of the number of retained dimensions of the clusters is determined by varying the NMSE. We build SR-tree indices based on the vectors in the clusters to determine the number of accessed pages for exact k-NN queries (Thomasian et al., Inf Process Lett - IPL 94(6):247---252, 2005) (see Section A.3 versus the NMSE. Varying the NMSE a minimum cost can be found, because the lower cost of accessing a smaller index is offset with the higher postprocessing cost resulting from lower retrieval accuracy. Experimenting with one synthetic and three real-world datasets leads to the conclusion that the lowest cost is attained at NMSE???0.03 and between 1/3 and 1/2 of the number of dimensions are retained. In one case doubling the number of dimensions cuts the number of accessed pages by one half. The Appendix provides the requisite background information for reading this paper.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.