Multi-class Imbalanced Data Research Articles

Learning from imbalanced data, where the number of observations in one class is significantly rarer than in other classes, has gained considerable attention in the data mining community. Most existing literature focuses on binary imbalanced case while multi-class imbalanced learning is barely mentioned. What's more, most proposed algorithms treated all imbalanced data consistently and aimed to handle all imbalanced data with a versatile algorithm. In fact, the imbalanced data varies in their imbalanced ratio, dimension and the number of classes, the performances of classifiers for learning from different types of datasets are different. In this paper we propose an adaptive multiple classifier system named of AMCS to cope with multi-class imbalanced learning, which makes a distinction among different kinds of imbalanced data. The AMCS includes three components, which are, feature selection, resampling and ensemble learning. Each component of AMCS is selected discriminatively for different types of imbalanced data. We consider two feature selection methods, three resampling mechanisms, five base classifiers and five ensemble rules to construct a selection pool, the adapting criterion of choosing each component from the selection pool to frame AMCS is analyzed through empirical study. In order to verify the effectiveness of AMCS, we compare AMCS with several state-of-the-art algorithms, the results show that AMCS can outperform or be comparable with the others. At last, AMCS is applied in oil-bearing reservoir recognition. The results indicate that AMCS makes no mistake in recognizing characters of layers for oilsk81-oilsk85 well logging data which is collected in Jianghan oilfield of China.

Read full abstract

Imbalanced problems are quite pervasive in many real-world applications. In imbalanced distributions, a class or some classes of data, called minority class(es), is/are under-represented compared to other classes. This skewness in the data underlying distribution causes many difficulties for typical machine learning algorithms. The notion becomes even more complicated when machine learning algorithms are to combat multi-class imbalanced problems. The presented solutions for tackling the issues arising from imbalanced distributions, generally fall into two main categories: data-oriented methods and model-based algorithms. Focusing on the latter, this paper suggests an elegant blend of boosting and over-sampling paradigms, which is called MDOBoost, to bring considerable benefits to the learning ability of multi-class imbalanced data sets. The over-sampling technique introduced and adopted in this paper, Mahalanobis distance-based over-sampling technique (MDO in short), is delicately incorporated into boosting algorithm. In fact, the minority classes are over-sampled via MDO technique in such a way that they almost preserve the original minority class characteristics. MDO, in comparison with the popular method in this field, SMOTE, generates more similar minority class examples to original class samples. Moreover, the broader representation of minority class examples is provided via MDO, and this, in turn, causes the classifier to build larger decision regions. MDOBoost increases the generalization ability of a classifier, since it indicates better results with pruned version of C4.5 classifier; unlike other over-sampling/boosting procedures, which have difficulties with pruned version of C4.5. MDOBoost is applied to real-world multi-class imbalanced benchmarks and its performance is then compared with several data-level and model-based algorithms. The empirical results and theoretical analyses reveal that MDOBoost offers superior advantages compared to popular class decomposition and over-sampling techniques in terms of MAUC, G-mean, and minority class recall.

Read full abstract

Multi-class Imbalanced Data Research Articles

Related Topics

Articles published on Multi-class Imbalanced Data

Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Cluster-based sampling of multiclass imbalanced data

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

A new approach for imbalanced data classification based on data gravitation

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques

Parameter-free classification in multi-class imbalanced data sets

Dynamic sampling approach to training neural networks for multiclass imbalance classification.

Multi-class protein fold classification using a new ensemble machine learning approach.

Mix-ratio sampling: Classifying multiclass imbalanced mouse brain images using support vector machine

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-class Imbalanced Data Research Articles

Related Topics

Articles published on Multi-class Imbalanced Data

Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Cluster-based sampling of multiclass imbalanced data

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

A new approach for imbalanced data classification based on data gravitation

To combat multi-class imbalanced problems by means of over-sampling and boosting techniques

Parameter-free classification in multi-class imbalanced data sets

Dynamic sampling approach to training neural networks for multiclass imbalance classification.

Multi-class protein fold classification using a new ensemble machine learning approach.

Mix-ratio sampling: Classifying multiclass imbalanced mouse brain images using support vector machine