Performance bottleneck of subsequence matching in time-series databases: Observation, solution, and performance evaluation

Sang-Wook Kim,Byeong-Soo Jeong

doi:10.1016/j.ins.2007.06.032

Abstract

Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper identifies a performance bottleneck in subsequence matching, and then proposes an effective method that substantially improves the performance of entire subsequence matching by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post-processing steps of subsequence matching through preliminary experiments. Based on these results, we show that the post-processing step is a main performance bottleneck in subsequence matching. Then, we argue that the optimization of the post-processing step is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple yet highly effective method for expediting the post-processing step. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancies of disk accesses and CPU processing that occur in the post-processing step. Our method is fairly efficient, and does not incur any false dismissal. We quantitatively demonstrate the superiority of our method through extensive experimentation. The results show that our method produces a significantly faster post-processing step; When using a data set of real-world stock sequences, our method was 43.36–96.75 times faster than previous methods, and when using data sets of large numbers of synthetic sequences, our method was 12.48–26.95 times faster than previous methods. Also, the results show that our method reduces the weight of the post-processing step over entire subsequence matching from more than 97% to less than 67%. This implies that our method successfully resolves the performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. Compared with previous methods, our method is 16.17–32.64 times faster when using a data set of real-world stock sequences and 8.64–14.29 times faster when using data sets of large numbers of synthetic sequences.

Full Text