Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Yang Liu,Xiaohui Yu,Jimmy Xiangji Huang,Aijun An

doi:10.1016/j.ipm.2010.11.007

Abstract

Learning from imbalanced datasets is difficult. The insufficient information that is associated with the minority class impedes making a clear understanding of the inherent structure of the dataset. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper, we study the performance of SVMs, which have gained great success in many real applications, in the imbalanced data context. Through empirical analysis, we show that SVMs may suffer from biased decision boundaries, and that their prediction performance drops dramatically when the data is highly skewed. We propose to combine an integrated sampling technique, which incorporates both over-sampling and under-sampling, with an ensemble of SVMs to improve the prediction performance. Extensive experiments show that our method outperforms individual SVMs as well as several other state-of-the-art classifiers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Dec 17, 2010
Citations: 134

Similar Papers

Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles
Yang Liu ... Aijun An
-
Yang Liu, et. al.Yang Liu ... Aijun An
01 Jan 2006
01 Jan 2006

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

SMOTEBoost: Improving Prediction of the Minority Class in Boosting
Nitesh V Chawla ... Kevin W Bowyer
-
Nitesh V Chawla, et. al.Nitesh V Chawla ... Kevin W Bowyer
01 Jan 2003
01 Jan 2003

Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets
Yanping Xu ... Xinxin Niu
International Journal of Distributed Sensor Networks | VOL. 13
Yanping Xu, et. al.Yanping Xu ... Xinxin Niu
01 Apr 2017
International Journal of Distributed Sensor Networks | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combining integrated sampling with SVM ensembles for learning from imbalanced datasets

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management