Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging

Youngjin Han,Inwhee Joe

doi:10.3390/app14219772

Abstract

Predicting survival outcomes in critical accidents has been a focal point in machine learning research. This study addresses several limitations of existing methods, including insufficient management of data imbalance, lack of emphasis on hyperparameter tuning, and proneness to overfitting. Many existing models struggle to generalize effectively on imbalanced datasets or depend on default hyperparameter settings, resulting in biased predictions. By integrating Principal Component Analysis (PCA), hyperparameter optimization, and resampling methods, as well as combining Edited Nearest Neighbors (ENN) with the Synthetic Minority Oversampling Technique (SMOTE), the model significantly improves predictive accuracy and model generalization. An ensemble model combining seven machine learning algorithms—Logistic Regression, Support Vector Machine, KNN, Random Forest, XGBoost, LightGBM, and CatBoost—was applied to predict survival outcomes. Stochastic Weighted Averaging (SWA) was applied to mitigate overfitting and enhance generalization. The accuracy increased from 91.97% to 94.89% after SWA was applied in this specific scenario. The combination of PCA-based dimensionality reduction, hyperparameter tuning, and resampling techniques (ENN + SMOTE) ensured the model handled data imbalance and optimized predictive accuracy. The final model demonstrated excellent performance, with Area Under the Curve (AUC) and Average Precision (AP) values both reaching 0.98, indicating high accuracy and precision. These improvements were validated using the Titanic dataset in a binary classification problem of predicting passenger survival. The results emphasize that ensemble learning, enhanced by SWA, offers a powerful framework for handling imbalanced and complex datasets, providing significant advancements in predictive modeling accuracy. This study provides insights into how machine learning techniques can be effectively combined to solve classification challenges in real-world scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Oct 25, 2024
License type: CC BY 4.0

Similar Papers

Automated semiconductor wafer defect classification dealing with imbalanced data
Po-Hsuan Lee ... Zhe Wang
-
Po-Hsuan Lee, et. al.Po-Hsuan Lee ... Zhe Wang
20 Mar 2020
20 Mar 2020

Comparative analysis of resampling algorithms in the prediction of stroke diseases
Dauda Sani Abdullahi ... Dr Muhammad Sirajo Aliyu
UMYU Scientifica | VOL. 2
Dauda Sani Abdullahi, et. al.Dauda Sani Abdullahi ... Dr Muhammad Sirajo Aliyu
30 Mar 2023
UMYU Scientifica | VOL. 2

Comparative Multinomial Text Classification Analysis of Naïve Bayes and XGBoost with SMOTE on Imbalanced Dataset
Ashish Chaturvedi ... Santosh Yadav
-
Ashish Chaturvedi, et. al.Ashish Chaturvedi ... Santosh Yadav
05 Sep 2021
05 Sep 2021

B-111 Advancing Precision Medicine in Multiple Myeloma: Addressing Demographic Variabilities and Imbalanced Data in the NIH All of Us Research Program Cohort
T A Houze ... J Mcclain
Clinical Chemistry | VOL. 70
T A Houze, et. al.T A Houze ... J Mcclain
02 Oct 2024
Clinical Chemistry | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging

Abstract

Talk to us

Similar Papers

More From: Applied Sciences