Imbalanced data classification using improved synthetic minority over-sampling technique

Yamijala Anusha,Konda Srinivas,R Visalakshi

doi:10.3233/mgs-230007

Abstract

In data mining, deep learning and machine learning models face class imbalance problems, which result in a lower detection rate for minority class samples. An improved Synthetic Minority Over-sampling Technique (SMOTE) is introduced for effective imbalanced data classification. After collecting the raw data from PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases, the pre-processing is performed using min-max normalization, cleaning, integration, and data transformation techniques to achieve data with better uniqueness, consistency, completeness and validity. An improved SMOTE algorithm is applied to the pre-processed data for proper data distribution, and then the properly distributed data is fed to the machine learning classifiers: Support Vector Machine (SVM), Random Forest, and Decision Tree for data classification. Experimental examination confirmed that the improved SMOTE algorithm with random forest attained significant classification results with Area under Curve (AUC) of 94.30%, 91%, 96.40%, and 99.40% on the PIMA, Yeast, E.coli, and Breast cancer Wisconsin databases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Imbalanced data classification using improved synthetic minority over-sampling technique

Abstract

Talk to us

Similar Papers

More From: Multiagent and Grid Systems

Lead the way for us

Similar Papers

Automated Classification of Tropical Plant Species Data Based on Machine Learning Techniques and Leaf Trait Measurements
Burhan Rashid Hussein ... Wee-Hong Ong
-
Burhan Rashid Hussein, et. al.Burhan Rashid Hussein ... Wee-Hong Ong
31 Aug 2019
31 Aug 2019

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
Kiranmayi Kotipalli ... Shan Suthaharan
-
Kiranmayi Kotipalli, et. al.Kiranmayi Kotipalli ... Shan Suthaharan
13 Oct 2014
13 Oct 2014

Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding
Juanjuan Wang ... Jiwu Zhang
-
Juanjuan Wang, et. al.Juanjuan Wang ... Jiwu Zhang
01 Jan 2006
01 Jan 2006

Effective Prediction of Type II Diabetes Mellitus Using Data Mining Classifiers and SMOTE
Mirza Shuja ... Majid Zaman
-
Mirza Shuja, et. al.Mirza Shuja ... Majid Zaman
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Imbalanced data classification using improved synthetic minority over-sampling technique

Abstract

Talk to us

Similar Papers

More From: Multiagent and Grid Systems