Abstract

Similarity search has been widely used in many applications such as information retrieval, image data analysis, and time-series matching. Previous work on similarity search usually consider the search problem in the full space. In this paper, however, we tackle a problem, subspace similarity search, which finds all data objects that match with a query object in the subspace instead of the original full space. In particular, the query object can specify arbitrary subspace with arbitrary number of dimensions. Due to the exponential number of possible subspaces specified by users, we introduce an efficient and effective pruning technique, which assigns scores to data objects with respect to pivots and prunes candidates via scores. We propose an effective multipivot-based method to preprocess data objects by selecting appropriate pivots, where the entire procedure is guided by a formal cost model, such that the pruning power is maximized. Then, scores of each data object are organized in sorted lists to facilitate an efficient subspace similarity search. Furthermore, many real-world application data such as image databases, time-series data, and sensory data often contain noises, which can be modeled as uncertain objects. Different from certain data, efficient query processing on uncertain data is more challenging due to its intensive computation of probability confidences. Thus, it is also crucial to answer subspace queries efficiently and effectively over uncertain objects. Specifically, we define a novel query, namely probabilistic subspace range query (PSRQ) in the uncertain database, which finds objects within a distance from a query object in any subspace with high probability. To address this query, we extend our proposed pruning techniques for precise data to that of answering PSRQ in arbitrary subspaces. Extensive experiments demonstrated the performance of our proposed approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.