Abstract

Most outlier detection methods output outlier score that measures the degree of deviation of a data sample from a normal data pattern. However, it is difficult to choose an optimal threshold on outlier scores by which outliers and normal data samples can be distinguished. In this paper, we propose a tree-based outlier detection method which computes normalized outlier scores for data samples. In particular, without the need to determine the threshold for outlier score it provides binary labels for outlier prediction. By using training data which consists of normal data samples, the proposed method builds a multi-way splitting tree, called region-partition tree (RP-tree), where normal data region is effectively described by the partition of data region into leaf nodes. By utilizing region-partition table (RP-table) which stores the information for splitting attributes and interval partition, RP-tree can be constructed so as to finely split the normal data region but keep the size of a tree be reasonably small. From the ensemble of RP-trees, the proposed method computes the normalized outlier scores ranging in [0, 1] and data samples with outlier score of 1 are predicted as outliers. Also it identifies the attributes responsible for outlier prediction. Experimental results demonstrate the outlier detection performance of the proposed method. The proposed method obtained an average F1-value of 0.72 and an AUC score of 0.96, while the second highest performance in the compared methods was an F1-value of 0.57 and an AUC score of 0.94, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.