Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability

Kazuki Fujiwara

doi:10.1016/j.rineng.2024.103406

Abstract

Machine learning classification models often struggle with imbalanced datasets, leading to poor performance in minority classes. While preprocessing approaches like resampling can improve minority class detection, they may introduce sampling bias and reduce model explainability. This study proposes a novel method combining random undersampling (RUS) with knowledge distillation (KD) to enhance both predictive performance and explainability stability for imbalanced data classification. Our approach employs a two-step learning process: (1) training a balanced teacher model using RUS and (2) training an imbalanced student model through response-based KD, utilizing both soft and hard targets. We hypothesize that this method mitigates class imbalance while preserving important information from the original dataset. We evaluated our proposed model against baseline and RUS-only models using five diverse imbalanced datasets from various domains. Performance was assessed using stratified 10-fold cross-validation with ROC-AUC and PR-AUC scores. Explainability stability was measured by the cosine similarity of SHAP values across cross-validation folds. Results demonstrate that our proposed model consistently outperforms both baseline and RUS-only models regarding ROC-AUC and PR-AUC scores across all datasets. Moreover, it exhibits superior explainability stability in the majority of cases, addressing the sampling bias issue associated with traditional resampling methods. This research contributes to the field of machine learning by offering a novel approach that simultaneously improves predictive performance and maintains explainability for imbalanced data classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability

Abstract

Talk to us

Similar Papers

More From: Results in Engineering

Lead the way for us

Similar Papers

Machine Learning Based Prediction Models for Spontaneous Ureteral Stone Passage
Nida Seraj ... Rashid Ali
-
Nida Seraj, et. al.Nida Seraj ... Rashid Ali
26 Nov 2022
26 Nov 2022

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

CLASSIFICATION BOOSTING IN IMBALANCED DATA
Sinta Septi Pangastuti ... Kartika Fithriasari
Malaysian Journal of Science | VOL. 38
Sinta Septi Pangastuti, et. al.Sinta Septi Pangastuti ... Kartika Fithriasari
30 Sep 2019
Malaysian Journal of Science | VOL. 38

SMOTEBoost: Improving Prediction of the Minority Class in Boosting
Nitesh V Chawla ... Kevin W Bowyer
-
Nitesh V Chawla, et. al.Nitesh V Chawla ... Kevin W Bowyer
01 Jan 2003
01 Jan 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Knowledge distillation with resampling for imbalanced data classification: Enhancing predictive performance and explainability stability

Abstract

Talk to us

Similar Papers

More From: Results in Engineering