Abstract

Labeled data, particularly for the outlier class, are difficult to obtain. Thus, outlier detection is typically regarded as an unsupervised learning problem. However, it still has an opportunity to obtain few labeled data. For example, a human analyst can give feedback to few data when he/she examines the results of an unsupervised outlier detection method. Moreover, the widely used unsupervised method for outlier detection cannot only take the labeled data into consideration nor use them properly. In this study, we first propose a graph-based method to endow the unsupervised method with the ability to consider few labeled data. Then, we extend our semi-supervised method to active outlier detection by incorporating the query strategy that selects top-ranked outliers. Comprehensive experiments on 12 real-world datasets demonstrate that our semi-supervised outlier detection method is comparable with the best of state-of-the-art approaches, and our active outlier detection method outperforms state-of-the-art methods.

Highlights

  • Outlier detection is a classic problem in data mining with many applications, such as network intrusion detection, environmental monitoring, fraud detection, etc

  • The results show that our semi-supervised outlier detection method is comparable with the best of state-of-the-art approaches, and our active outlier detection method outperforms state-of-the-art methods

  • We propose a method for semi-supervised outlier detection based on an unsupervised outlier detection method and a graph-based semi-supervised learning method

Read more

Summary

INTRODUCTION

Outlier detection is a classic problem in data mining with many applications, such as network intrusion detection, environmental monitoring, fraud detection, etc. In [4], the authors proposed to optimize the weights of an ensemble-based outlier detection method to fit labeled data. They restrained the final result to be the convex combination of base outlier detectors. In the problem settings of few-shot learning, only the target classes are provided with few-shot of data while other classes are provided with a large number of labeled data It is different from our problem settings, and these methods cannot be used directly in semi-supervised outlier detection. When the value of fs(x) is larger than 0.5, data point x can be a potential outlier

SEMI-SUPERVISED OUTLIER DETECTION METHOD
1: Compute the outlier score of iForest f s using Equation 3
EXPERIMENTS AND RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call