Abstract
Background: Unbalanced datasets present a significant challenge in machine learning, often leading to biased models that favor the majority class. Recent oversampling techniques like SMOTE, Borderline SMOTE, and ADASYN attempt to mitigate these issues. This study investigates these techniques in conjunction with machine learning models like SVM, Decision Tree, and Logistic Regression. The results reveal critical challenges such as noise amplification and overfitting, which we address by refining the oversampling approaches to improve model performance and generalization. Aim: In order to address this challenge of unbalanced datasets, the minority class is oversampled to accommodate the majority class. Oversampling techniques such SMOTE (Synthetic Minority Oversampling Technique), Borderline SMOTE and ADASYN (Adaptive Synthetic Sampling) are used in this work. Objective: To perform the comprehensive analysis of various oversampling methods for taking acre of class imbalance issue using ML methods. Method: The proposed methodology uses BERT technique which removes the pre-processing step. Various proposed oversampling techniques in the literature are used for balancing the data, followed by feature extraction followed by text classification using ML algorithms. Experiments are performed using ML classification algorithms like Decision tree (DT), Logistic regression (LR), Support vector machine (SVM) and Random forest (RF) for categorizing the data. Result: The results show improvement corresponding SVM using Borderline SMOTE, resulting in an accuracy of 71.9% and MCC value of 0.53. Conclusion: The suggested method assists in the evolution of fairer and more effective ML models by addressing this basic issue of class imbalance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Recent Advances in Computer Science and Communications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.