Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

Taghi M Khoshgoftaar,Amri Napolitano,Jason Van Hulse

doi:10.1109/tsmca.2010.2084081

Abstract

This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society

Lead the way for us

Journal: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society	Publication Date: May 1, 2011
Citations: 288

Similar Papers

Knowledge discovery from imbalanced and noisy data
Jason Van Hulse ... Taghi Khoshgoftaar
Data & Knowledge Engineering | VOL. 68
Jason Van Hulse, et. al.Jason Van Hulse ... Taghi Khoshgoftaar
23 Aug 2009
Data & Knowledge Engineering | VOL. 68

An empirical study of the classification performance of learners on imbalanced and noisy software quality data
Chris Seiffert ... Andres Folleco
Information sciences | VOL. 259
Chris Seiffert, et. al.Chris Seiffert ... Andres Folleco
09 Jan 2011
Information sciences | VOL. 259

Knowledge discovery from noisy imbalanced and incomplete binary class data
Arjun Puri ... Manoj Kumar Gupta
Expert systems with applications | VOL. 181
Arjun Puri, et. al.Arjun Puri ... Manoj Kumar Gupta
15 May 2021
Expert systems with applications | VOL. 181

Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles
Jerzy Błaszczyński ... Mateusz Lango
-
Jerzy Błaszczyński, et. al.Jerzy Błaszczyński ... Mateusz Lango
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society