Abstract

Subsequence matching is one of the most important issues in the field of data mining. The existing subsequence matching algorithms use windows of the fixed size to construct only one index. The algorithms have a problem that their performance gets worse as the difference between the query sequence length and the window size increases. In this paper, we propose a new subsequence matching method based on index interpolation, which is a technique that constructs the indexes for multiple window sizes and chooses an index most appropriate for a given query sequence for subsequence matching. We first examine the performance change due to the window size effect through preliminary experiments, and devise a cost function for subsequence matching that reflects the distribution of query sequence lengths in the view point of physical database design. Next, we propose a new subsequence matching method to improve search performance, and present an algorithm based on the cost function to construct the multiple indexes to maximize the performance. Finally, we verify the superiority of the proposed method through a series of experiments using the real and the synthetic data sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call