Breaking the Curse of Class Imbalance: Bangla Text Classification

Md Rafi-Ur-Rashid,Muhammad Abdullah Adnan,Mahim Mahbub

doi:10.1145/3511601

Abstract

This article addresses the class imbalance issue in a low-resource language called Bengali. As a use-case, we choose one of the most fundamental NLP tasks, i.e., text classification, where we utilize three benchmark text corpora: fake-news dataset, sentiment analysis dataset, and song lyrics dataset. Each of them contains a critical class imbalance. We attempt to tackle the problem by applying several strategies that include data augmentation with synthetic samples via text and embedding generation in order to augment the proportion of the minority samples. Moreover, we apply ensembling of deep learning models by subsetting the majority samples. Additionally, we enforce the focal loss function for class-imbalanced data classification. We also apply the outlier detection technique, data resampling, and hidden feature extraction to improve the minority-f1 score. All of our experimentations are entirely focused on textual content analysis, which results in a more than90%minority f1 score for each of the three tasks. It is an excellent outcome on such highly class-imbalanced datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Breaking the Curse of Class Imbalance: Bangla Text Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Apr 29, 2022
Citations: 4

Similar Papers

ConCave-Convex procedure for support vector machines with Huber loss for text classification
Parashjyoti Borah ... Barenya Bikash Hazarika
Computers and Electrical Engineering | VOL. 122
Parashjyoti Borah, et. al.Parashjyoti Borah ... Barenya Bikash Hazarika
03 Dec 2024
Computers and Electrical Engineering | VOL. 122

An asymmetric stagewise least square loss function for imbalanced classification
Guibiao Xu ... Bao-Gang Hu
-
Guibiao Xu, et. al.Guibiao Xu ... Bao-Gang Hu
01 Jul 2014
01 Jul 2014

Flexible loss functions for binary classification in gradient-boosted decision trees: An application to credit scoring
Jonah Mushava ... Michael Murray
Expert Systems with Applications | VOL. 238
Jonah Mushava, et. al.Jonah Mushava ... Michael Murray
02 Oct 2023
Expert Systems with Applications | VOL. 238

An empirical study on the joint impact of feature selection and data resampling on imbalance classification
Chongsheng Zhang ... Paolo Soda
Applied Intelligence | VOL. -
Chongsheng Zhang, et. al.Chongsheng Zhang ... Paolo Soda
23 Jun 2022
Applied Intelligence | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Breaking the Curse of Class Imbalance: Bangla Text Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing