Abstract

Outlier detection is a fundamental task for knowledge discovery in data mining, especially in the Big Data era. It aims to detect data items that deviate from the general pattern of a given data set. In this paper, we present a new outlier detection technique using tourist walks starting from each data sample and varying the memory size. Specifically, a data sample gets a higher outlier score if it participates in few tourist walk attractors, while it gets a low score if it participates in a large number of attractors. Experimental results on artificial and real data sets show good performance of the proposed method. In comparison to classical outlier detection methods, the proposed one shows the following salient features: (1) It finds out outliers by identifying the structure of the input data set instead of considering only physical features, such as distance, similarity or density. (2) It can detect not only external outliers as classical methods do, but also internal outliers staying among various normal data groups. (3) By varying the memory size, the tourist walks can characterize both local and global structures of the data set. (4) A parallel implementation is quite convenient due to the nature of large amount of independent walking of the algorithm. (5) The proposed method is a deterministic technique. Therefore, only one run is sufficient, in contrast to stochastic techniques, which require many runs. Moreover, in this work, we find, for the first time, that tourist walks can generate complex attractors in various crossing shapes. Such complex attractors reveal data structures in more details. Consequently, it can improve the outlier detection performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call