Probabilistic Similarity Search in Uncertain Time-Series Database

Xiaofeng Ding,Yansheng Lu,Aling Qian

doi:10.1109/iciecs.2009.5365539

Abstract

The nearest neighbor search over time-series databases has been a hot research topic for a long time period, which is widely used in many applications, including information retrieval, genetic data matching, data mining, and so on. However, due to high dimensionality (i.e. length) and uncertainty of the time series, the similarity search over directly indexed precise time series usually encounters serious problems, such as the "dimensionality curse" and "trust ability curse". Conventionally, many dimensionality reduction techniques and uncertainty processing strategies are proposed separately to break such drawbacks by reducing the dimensionality of time series and simulating the data uncertainty. However, among all the proposed methods, there does not have indexing mechanisms to support similarity queries, which supports efficiently search over very large uncertain time-series databases. In this paper, we re-investigate PLA for approximating and indexing uncertain time series. In particular, we propose a novel distance function in the reduced PLA-space, and this function leads to a lower bound of the Euclidean distance between the original uncertain time series, which can lead to no false negatives during the similarity search. In the following step, based on three lemmas, we develop an effective approach to index these lower bounds to improve the nearest neighbor query efficiency. Finally, extensive experiments over synthetic data sets have demonstrated the efficiency and effectiveness of PLA together with the newly proposed lower bound lemmas, in terms of both pruning power and wall clock time, compared with the baseline algorithm.

Full Text