Abstract

Outlier detection is one of the knowledge discovery problems that identifies a data point which does not agree with majority data points in a dataset. In the real-world datasets, the majority data points normally line up into patterns that can be captured by some models. In this paper, we propose the new outlier detection algorithm based on the dynamically updated tree model. It composes of two-step processes (1) constructing the extreme-centroid tree from a sampling dataset, and (2) dynamically updated extreme-centroid tree. In the extreme-centroid tree construction step, the root initially identifies two extreme data points from the centroid of a sampling dataset and uses them for splitting data points into groups. It continues splitting until the terminal criterion is met. A leaf node with a single data point is assigned as a suspected outlier in this process. The suspected outliers are trimmed from the tree model and sent back to the rest of a dataset. In the dynamically updated extreme-centroid tree step, a data point from the rest of a dataset will be inserted to the tree model, called the new inserted data point, and a single data point in the tree model is randomly removed from this tree model to maintain the amount of current data points, called the expired data point. The new inserted data point and the expired data point will adjust the tree maintaining the linear time complexity. We compared our algorithm with LOF algorithm and COF algorithm on the synthetic dataset and three UCI datasets. In the UCI datasets, a majority class is selected and other classes are randomly picked as the outliers. The results show that our algorithm outperformed when compared to LOF and COF using precision, recall, and F-measure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.