Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting

Genta Indra Winata,Masayu Leylia Khodra

doi:10.1109/iceei.2015.7352552

Abstract

Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms are employed to handle the issue and improve the performance of text categorization. The result is evaluated with four evaluation metrics such as hamming loss, subset accuracy, example-based accuracy and micro-averaged f-measure. Bagging ML-LP with SMO weak classifier is the best performer in terms of subset accuracy and example-based accuracy. Bagging ML-BR with SMO weak classifier has the best micro-averaged f-measure among all. In other hand, AdaBoost MH with J48 weak classifier has the lowest hamming loss value. Thus, both algorithms have high potential in boosting the performance of text categorization, but only for certain weak classifiers. However, bagging has more potential than adaptive boosting in increasing the accuracy of minority labels.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A hybrid approach for text categorization by using x2 statistic, principal component analysis and particle swarm optimization

Scientific Research and Essays | VOL. 8

04 Oct 2013
Scientific Research and Essays | VOL. 8

Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation
Xinghua Lu ... Chengxiang Zhai
Journal of the American Medical Informatics Association | VOL. 13
Xinghua Lu, et. al.Xinghua Lu ... Chengxiang Zhai
30 Aug 2006
Journal of the American Medical Informatics Association | VOL. 13

A Review on Supervised Machine Learning Text Categorization Approaches
Aayushi A Shah ... Keyur Rana
-
Aayushi A Shah, et. al.Aayushi A Shah ... Keyur Rana
01 Dec 2018
01 Dec 2018

Text Categorization by Learning Predominant Sense of Words as Auxiliary Task
Kazuya Shimura ... Jiyi Li
-
Kazuya Shimura, et. al.Kazuya Shimura ... Jiyi Li
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Handling imbalanced dataset in multi-label text categorization using Bagging and Adaptive Boosting

Abstract

Talk to us

Similar Papers