Abstract

In severely imbalanced datasets, using traditional binary or multi-class classification typically leads to bias towards the class(es) with the much larger number of instances. Under such conditions, modeling and detecting instances of the minority class is very difficult. One-class classification (OCC) is an approach to detect abnormal data points compared to the instances of the known class and can serve to address issues related to severely imbalanced datasets, which are especially very common in big data. We present a detailed survey of OCC-related literature works published over the last decade, approximately. We group the different works into three categories: outlier detection, novelty detection, and deep learning and OCC. We closely examine and evaluate selected works on OCC such that a good cross section of approaches, methods, and application domains is represented in the survey. Commonly used techniques in OCC for outlier detection and for novelty detection, respectively, are discussed. We observed one area that has been largely omitted in OCC-related literature is its application context for big data and its inherently associated problems, such as severe class imbalance, class rarity, noisy data, feature selection, and data reduction. We feel the survey will be appreciated by researchers working in these areas of big data.

Highlights

  • The commonly known five Vs of big data are volume, variety, value, veracity, and velocity

  • This study presented a large survey of one-class classification methods and approaches, including domain-specific applications, presented in the literature over the last decade, i.e., 2010–2021 (May 2021)

  • Our survey categorizes the different works into three categories: outlier detection and One-class classification (OCC), novelty detection and OCC, and deep learning and OCC

Read more

Summary

Introduction

The commonly known five Vs of big data are volume, variety, value, veracity, and velocity. In [29] the authors present, in the context of several UCI datasets, a comparative case study of OCRF with a number of reference one class classification algorithms, namely Gaussian density models, Parzen estimators, Gaussian mixture models and one class support vector machines. The authors discuss that OCC is a promising research direction for data stream analysis and can be used for binary classification with only instances from one class, outlier detection, and novelty detection. In the context of autonomous structural health monitoring of bridges, Favarelli and Giorgetti [40] present a machine learning approach toward the automatic detection of anomalies in a bridge structure from vibrational data They propose two anomaly detection methods, named One-Class Classifier Neural Networks, OCCNN and OCCNN2.

Outlier analysis and one class classification
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.