A design of information granule-based under-sampling method in imbalanced data classification

Tianyu Liu,Zhiwu Li,Witold Pedrycz,Xiubin Zhu

doi:10.1007/s00500-020-05023-2

Abstract

In numerous real-world problems, we are faced with difficulties in learning from imbalanced data. The classification performance of a “standard” classifier (learning algorithm) is evidently hindered by the imbalanced distribution of data. The over-sampling and under-sampling methods have been researched extensively with the aim to increase the predication accuracy over the minority class. However, traditional under-sampling methods tend to ignore important characteristics pertinent to the majority class. In this paper, a novel under-sampling method based on information granules is proposed. The method exploits the concepts and algorithms of granular computing. First, information granules are built around the selected patterns coming from the majority class to capture the essence of the data belonging to this class. In the sequel, the resultant information granules are evaluated in terms of their quality and those with the highest specificity values are selected. Next, the selected numeric data are augmented by some weights implied by the size of information granules. Finally, a support vector machine and a K-nearest-neighbor classifier, both being regarded here as representative classifiers, are built based on the weighted data. Experimental studies are carried out using synthetic data as well as a suite of imbalanced data sets coming from the public machine learning repositories. The experimental results quantify the performance of support vector machine and K-nearest-neighbor with under-sampling method based on information granules. The results demonstrate the superiority of the performance obtained for these classifiers endowed with conventional under-sampling method. In general, the improvement of performance expressed in terms of G-means is over 10% when applying information granule under-sampling compared with random under-sampling.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A design of information granule-based under-sampling method in imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications

Lead the way for us

Journal: Soft Computing - A Fusion of Foundations, Methodologies and Applications	Publication Date: May 13, 2020
Citations: 15

Similar Papers

A novel progressively undersampling method based on the density peaks sequence for imbalanced data
Xiaoying Xie ... Wen Li
Knowledge Based Systems | VOL. 213
Xiaoying Xie, et. al.Xiaoying Xie ... Wen Li
27 Dec 2020
Knowledge Based Systems | VOL. 213

A Comparative Study of Sampling Methods and Algorithms for Imbalanced Time Series Classification
Guohua Liang ... Chengqi Zhang
-
Guohua Liang, et. al.Guohua Liang ... Chengqi Zhang
01 Jan 2012
01 Jan 2012

CDBH: A clustering and density-based hybrid approach for imbalanced data classification
Behzad Mirzaei ... Hossein Nezamabadi-Pour
Expert systems with applications | VOL. 164
Behzad Mirzaei, et. al.Behzad Mirzaei ... Hossein Nezamabadi-Pour
28 Sep 2020
Expert systems with applications | VOL. 164

A combination of clustering-based under-sampling with ensemble methods for solving imbalanced class problem in intelligent systems
Mohammad Saleh Ebrahimi Shahabadi ... Francesco Palmieri
Technological Forecasting & Social Change | VOL. 169
Mohammad Saleh Ebrahimi Shahabadi, et. al.Mohammad Saleh Ebrahimi Shahabadi ... Francesco Palmieri
11 May 2021
Technological Forecasting & Social Change | VOL. 169

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A design of information granule-based under-sampling method in imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Soft Computing - A Fusion of Foundations, Methodologies and Applications