Types of minority class examples and their influence on learning classifiers from imbalanced data

Krystyna Napierala,Jerzy Stefanowski

doi:10.1007/s10844-015-0368-1

Krystyna Napierala, Jerzy Stefanowski

Open Access

https://doi.org/10.1007/s10844-015-0368-1

Copy DOI

Abstract

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate classification performance of popular classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning classifiers and pre-processing methods.

Highlights

In many real life problems classifiers are faced with imbalanced data, which means that one of the target classes contains a much smaller number of instances than the other classes
Class imbalance is an obstacle for learning classifiers as they are biased toward the majority classes and tend to missclassify minority class examples
We present the visualisations after the Multidimensional Scaling (MDS) projection of three imbalanced datasets from the UCI repository, often used in the experimental studies concerning class imbalance: thyroid, ecoli and cleveland (Fig. 1b, c and d)

Summary

Introduction

In many real life problems classifiers are faced with imbalanced data, which means that one of the target classes contains a much smaller number of instances than the other classes. Class imbalances have been observed in many other application problems such as detection of oil spills in satellite images, analysing financial risk, predicting technical equipment failures, managing network intrusion, text categorization and information filtering; for some reviews see, e.g. (He and Garcia 2009; He and Ma 2013) In all those problems the correct recognition of the minority class is of key importance. Class imbalance is an obstacle for learning classifiers as they are biased toward the majority classes and tend to missclassify minority class examples

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Intelligent Information Systems	Publication Date: Jul 9, 2015
Citations: 219	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Types of minority class examples and their influence on learning classifiers from imbalanced data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Information Systems

Lead the way for us

Similar Papers

Identification of Different Types of Minority Class Examples in Imbalanced Data
Krystyna Napierala ... Jerzy Stefanowski
-
Krystyna Napierala, et. al.Krystyna Napierala ... Jerzy Stefanowski
01 Jan 2012
01 Jan 2012

Local Data Characteristics in Learning Classifiers from Imbalanced Data
Jerzy Błaszczyński ... Jerzy Stefanowski
-
Jerzy Błaszczyński, et. al.Jerzy Błaszczyński ... Jerzy Stefanowski
23 Sep 2017
23 Sep 2017

Difficulty Factors and Preprocessing in Imbalanced Data Sets: An Experimental Study on Artificial Data
Szymon Wojciechowski ... Szymon Wilk
Foundations of Computing and Decision Sciences | VOL. 42
Szymon Wojciechowski, et. al.Szymon Wojciechowski ... Szymon Wilk
16 Jun 2017
Foundations of Computing and Decision Sciences | VOL. 42

Using Information on Class Interrelations to Improve Classification of Multiclass Imbalanced Data: A New Resampling Algorithm
Małgorzata Janicka ... Mateusz Lango
International Journal of Applied Mathematics and Computer Science | VOL. 29
Małgorzata Janicka, et. al.Małgorzata Janicka ... Mateusz Lango
01 Dec 2019
International Journal of Applied Mathematics and Computer Science | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Types of minority class examples and their influence on learning classifiers from imbalanced data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Information Systems