An Efficient Distance Calculation Method for Uncertain Objects

Lurong Xiao,Edward Hung

doi:10.1109/cidm.2007.368846

Abstract

Recently the academic communities have paid more attention to the queries and mining on uncertain data. In the tasks such as clustering or nearest-neighbor queries, expected distance is often used as a distance measurement among uncertain data objects. Traditional database systems store uncertain objects using their expected (average) location in the data space. Distances can be calculated easily from the expected locations, but it poorly approximates the real expected distance values. Recent research work calculates the expected distance by calculating the weighted average of the pair-wise distances among samples of two uncertain objects. However the pair-wise distance calculations take much longer time than the the former method. In this paper, we propose an efficient method approximation by single Gaussian (ASG) to calculate the expected distance by a function of the means and variances of samples of uncertain objects. Theoretical and experimental studies show that ASG has both advantages of the latter method's high accuracy and the former method's fast execution time. We suggest that ASG plays an important role in reducing computational costs significantly in query processing and various data mining tasks such as clustering and outlier detection

Full Text