Abstract
Labeled data, particularly for the outlier class, are difficult to obtain. Thus, outlier detection is typically regarded as an unsupervised learning problem. However, it still has an opportunity to obtain few labeled data. For example, a human analyst can give feedback to few data when he/she examines the results of an unsupervised outlier detection method. Moreover, the widely used unsupervised method for outlier detection cannot only take the labeled data into consideration nor use them properly. In this study, we first propose a graph-based method to endow the unsupervised method with the ability to consider few labeled data. Then, we extend our semi-supervised method to active outlier detection by incorporating the query strategy that selects top-ranked outliers. Comprehensive experiments on 12 real-world datasets demonstrate that our semi-supervised outlier detection method is comparable with the best of state-of-the-art approaches, and our active outlier detection method outperforms state-of-the-art methods.
Highlights
Outlier detection is a classic problem in data mining with many applications, such as network intrusion detection, environmental monitoring, fraud detection, etc
The results show that our semi-supervised outlier detection method is comparable with the best of state-of-the-art approaches, and our active outlier detection method outperforms state-of-the-art methods
We propose a method for semi-supervised outlier detection based on an unsupervised outlier detection method and a graph-based semi-supervised learning method
Summary
Outlier detection is a classic problem in data mining with many applications, such as network intrusion detection, environmental monitoring, fraud detection, etc. In [4], the authors proposed to optimize the weights of an ensemble-based outlier detection method to fit labeled data. They restrained the final result to be the convex combination of base outlier detectors. In the problem settings of few-shot learning, only the target classes are provided with few-shot of data while other classes are provided with a large number of labeled data It is different from our problem settings, and these methods cannot be used directly in semi-supervised outlier detection. When the value of fs(x) is larger than 0.5, data point x can be a potential outlier
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.