Abstract

Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown new classes. It is therefore fundamental to develop algorithms able to distinguish between normal and abnormal test data. In the last few years, extreme value theory has become an important tool in multivariate statistics and machine learning. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm has some theoretical and practical drawbacks and can fail even if the recognition task is fairly simple. To overcome these limitations, we propose two new algorithms for anomaly detection relying on approximations from extreme value theory that are more robust in such cases. We exploit the intuition that test points that are extremely far from the training classes are more likely to be abnormal objects. We derive asymptotic results motivated by univariate extreme value theory that make this intuition precise. We show the effectiveness of our classifiers in simulations and on real data sets.

Highlights

  • Modern classifiers achieve human or super-human performance in a variety of tasks (Christopher 2016), including speech (Graves et al 2013) and image recognition (He et al 2016), but they are typically not able to discriminate between normal and abnormal classes and may give high confidence predictions for unrecognizable objectsE

  • We present two new kernel free algorithms that perform anomaly detection using extreme value theory

  • These algorithms, called the generalized Pareto distribution (GPD) classifier (GPDC) and the generalized extreme value (GEV) classifier (GEVC), are fast to update with the arrival of new data and they are easy to adapt to an incremental framework

Read more

Summary

Introduction

Modern classifiers achieve human or super-human performance in a variety of tasks (Christopher 2016), including speech (Graves et al 2013) and image recognition (He et al 2016), but they are typically not able to discriminate between normal and abnormal classes and may give high confidence predictions for unrecognizable objects. We underline that in this context standard hyper parameter optimization procedures such as cross-validation are usually not available, since in the training set there are only normal objects For this reason, an algorithm designed for anomaly detection should involve as few hyper parameters as possible. For this reason, we propose an alternative approach that uses extreme value theory overcomes this problem.

Related work
Extreme value theory
General setting
Algorithm description
Limitations of the EVM
The GPD classifier
Extreme value theory and anomaly detection
The GPDC algorithm
The GEV classifier
Application
Simulated data
OLETTER protocol
Diagnostics of thyroid disease
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.