Abstract

With the rapid expansion of data scale, big data mining and analysis have attracted increasing attention. Outlier detection as an important task of data mining is widely used in many applications. However, conventional outlier detection methods have difficulty handling large-scale datasets. In addition, most of them typically can only identify global outliers and are over sensitive to parameters variation. In this paper, we propose a novel method for robust local outlier detection with statistical parameters, which incorporates the clustering-based ideas in dealing with big data. Firstly, this method finds some density peaks of dataset by 3σ standard. Secondly, each remaining data object in the dataset is assigned to the same cluster as its nearest neighbor of higher density. Finally, we use Chebyshev's inequality and density peak reachability to identify local outliers of each group. The experimental results demonstrate the efficiency and accuracy of the proposed method in identifying both global and local outliers. Moreover, the method is also proved to be more stability analysis than typical outlier detection methods, such as LOF(Local Outlier Factor) and DBSCAN(Density-Based Spatial Clustering of Applications with Noise).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call