Abstract

Outlier detection (OD) is a key problem, for which numerous solutions have been proposed. To deal with the difficulties associated with outlier detection across various domains and data characteristics, ensembles of outlier detectors have recently been employed to improve the performance of individual outlier detectors. In this paper, we follow an ensemble outlier detection approach in which good outlier detectors are selected through an enhanced clustering-based dynamic selection (CBDS) method. In this method, a bisecting K-means clustering algorithm is employed to partition the input data into clusters where every cluster defines a local region of competence. Among the initial pool of detectors, the outputs of the detectors with the most competent local performance were combined through four possible schemes to produce the final OD results. Experimental evaluation and comparison of our method were carried out against four variants of locally selective combination in parallel (LSCP) outlier ensembles. The CBDS-based schemes compare well with the LSCP-based ones on 16 public benchmark datasets and incur considerably lower computational costs. The CBDS method consistently achieved superior average scores of the area under the curve (AUC) of the receiver operating characteristic (ROC), and particularly outperformed the LSCP method on nine of the 16 datasets in terms of the AUC score. In addition, while the CBDS and LSCP methods have similar computational costs on small datasets, the CBDS method achieves significant time savings compared with the LSCP method on large datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.