Equalization ensemble for large scale highly imbalanced data classification

Jinjun Ren,Yuping Wang,Mingqian Mao,Yiu-Ming Cheung

doi:10.1016/j.knosys.2022.108295

Abstract

The class-imbalance problem has been widely distributed in various research fields. The larger the data scale and the higher the data imbalance, the more difficult the proper classification. For large-scale highly imbalanced data sets, the ensemble method based on under-sampling is one of the most competitive techniques among the existing techniques. However, it is susceptible to improperly sampling strategies, easy to lose the useful information of the majority class, and not easy to generalize the learning model. To overcome these limitations, we propose an equalization ensemble method (EASE) with two new schemes. First, we propose an equalization under-sampling scheme to generate a balanced data set for each base classifier, which can reduce the impact of class imbalance on the base classifiers; Second, we design a weighted integration scheme, where the G-mean scores obtained by base classifiers on the original imbalanced data set are used as the weights. These weights can not only make the better-performed base-classifiers dominate the final classification decision, but also adapt to a variety of imbalanced data sets with different scales while avoiding the occurrence of some extremely bad situations. Experimental results on three metrics show that EASE increases the diversity of base classifiers and outperforms twelve state-of-the-art methods on the imbalanced data sets with different scales.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Equalization ensemble for large scale highly imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems

Lead the way for us

Journal: Knowledge-Based Systems	Publication Date: Jan 31, 2022
Citations: 21

Similar Papers

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델
Jae Sik Lee ... Jong Gu Kwon
Journal of Intelligence and Information Systems | VOL. 19
Jae Sik Lee, et. al.Jae Sik Lee ... Jong Gu Kwon
30 Jun 2013
Journal of Intelligence and Information Systems | VOL. 19

Adaptively Promoting Diversity in a Novel Ensemble Method for Imbalanced Credit-Risk Evaluation
Yitong Guo ... Zhiting Pan
Mathematics | VOL. 10
Yitong Guo, et. al.Yitong Guo ... Zhiting Pan
24 May 2022
Mathematics | VOL. 10

A novel Random Forest integrated model for imbalanced data classification problem
Qinghua Gu ... Song Jiang
Knowledge-Based Systems | VOL. 250
Qinghua Gu, et. al.Qinghua Gu ... Song Jiang
21 May 2022
Knowledge-Based Systems | VOL. 250

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Equalization ensemble for large scale highly imbalanced data classification

Abstract

Talk to us

Similar Papers

More From: Knowledge-Based Systems