Abstract

Instance-based classifiers become inefficient when the size of their training dataset or model is large. Therefore, they are usually applied in conjunction with a Data Reduction Technique that collects prototypes from the available training data. The set of prototypes is called the condensing set and has the benefit of low computational cost during classification, while, at the same time, accuracy is not negatively affected. In case of imbalanced training data, the number of prototypes collected for the minority (rare) classes may be insufficient. Even worse, the rare classes may be eliminated. This paper presents three methods that preserve the rare classes when data reduction is applied. Two of the methods apply data reduction only on the instances that belong to common classes and avoid costly under-sampling or over-sampling procedures that deal with class imbalances. The third method utilizes SMOTE over-sampling before data reduction. The three methods were tested by conducting experiments on twelve imbalanced datasets. Experimental results reveal high recall and very good reduction rates.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.