Abstract

Instance reduction techniques are data preprocessing methods originally developed to enhance the nearest neighbor rule for standard classification. They reduce the training data by selecting or generating representative examples of a given problem. These algorithms have been designed and widely analyzed in multi-class problems providing very competitive results. However, this issue was rarely addressed in the context of one-class classification. In this specific domain a reduction of the training set may not only decrease the classification time and classifier’s complexity, but also allows us to handle internal noisy data and simplify the data description boundary. We propose two methods for achieving this goal. The first one is a flexible framework that adjusts any instance reduction method to one-class scenario by introduction of meaningful artificial outliers. The second one is a novel modification of evolutionary instance reduction technique that is based on differential evolution and uses consistency measure for model evaluation in filter or wrapper modes. It is a powerful native one-class solution that does not require an access to counterexamples. Both of the proposed algorithms can be applied to any type of one-class classifier. On the basis of extensive computational experiments, we show that the proposed methods are highly efficient techniques to reduce the complexity and improve the classification performance in one-class scenarios.

Highlights

  • Data preprocessing is an essential step within the machine learning process [41,21,42]

  • To adapt Scale Factor Local Search in Differential Evolution (SFLSDE) algorithm to one-class classification (OCC) nature we propose to augment it with optimization criterion using the consistency metric

  • For some of datasets, the standard instance reduction (InR) techniques returned too small training set to build a one-class support vector classifier. This is because they use the nearest neighbor approach which has no lower bound on the size of the training set, while methods based on support vectors require a certain amount of samples for processing

Read more

Summary

Introduction

Data preprocessing is an essential step within the machine learning process [41,21,42]. Generating artificial counterexamples have been used so far in the process of training one-class classifiers [26], but not during the one-class preprocessing phase This approach could be viewed as a datalevel solution, as we modify our training data to allow unaltered usage of any InR algorithm from the literature. We present a family of data-level and algorithm-level InR methods for OCC and validate their usefulness and impact on training set reduction, classification accuracy and recognition time on the basis of thorough computational experiments. Such a comparison allows us to gain an insight into how we can reduce the size of the training set in the absence of counterexamples, while maintaining or even improving the obtained predictive performance. 2 Related Works This section provides the necessary background for the remainder of the paper

One-Class Classification
Instance Reduction in Standard Classification
The Role of Instance Reduction in One-Class Classification
Applying Instance Reduction to One-Class Classification
Adapting Existing Instance Reduction Methods to One-Class Classification
Evolutionary Filter and Wrapper Methods for One-Class Instance Reduction
Scale Factor Local Search in Differential Evolution for Instance reduction
Adapting SFLSDE to OCC
Datasets
Methods
20. Census-Income
Set-up
General Comments on Obtained Results
Results for One-Class Nearest Neighbor
Results for Minimum Spanning Tree Data Description
Results for Support Vector Data Description
Impact on the Computational Complexity
Conclusions and Future Works
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.