Abstract

In this paper, we propose an outlier detection method from an unlabeled target dataset by exploiting an unlabeled source dataset. Detecting outliers has attracted attention of data miners for over two decades, since such outliers can be crucial in decision making, knowledge discovery, and fraud detection, to name but a few. The fact that outliers are scarce and often tedious to label motivated researchers to propose detection methods from an unlabeled dataset, some of which borrow strengths from relevant labeled datasets in the framework of transfer learning. He et al. tackled a more challenging situation in which the input datasets coming from multiple tasks are all unlabeled. Their method, ML-OCSVM, conducts multi-task learning with one-class support vector machines (SVMs) and yields a mean model plus task-specific increments to detect outliers in the test datasets of the multiple tasks. We inherit a part of their problem setting, taking only unlabeled datasets in the input, but increase the difficulty by assuming only one source dataset in addition to the target dataset. Consequently, the source dataset consists of examples relevant to the target task as well as examples that are less relevant. To cope with this situation, we extend Selective Transfer Machine, which weights individual examples in the framework of covariate shift and learns an SVM classifier, to our one-class setting by replacing the binary SVMs with one-class SVMs. Experiments on two public datasets and an artificial dataset show that our method mostly outperforms baseline methods, including ML-OCSVM and a state-of-the-art ensemble anomaly detection method, in F1 score and AUC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.