Abstract
Machine learning classification models often struggle with imbalanced datasets, leading to poor performance in minority classes. While preprocessing approaches like resampling can improve minority class detection, they may introduce sampling bias and reduce model explainability. This study proposes a novel method combining random undersampling (RUS) with knowledge distillation (KD) to enhance both predictive performance and explainability stability for imbalanced data classification. Our approach employs a two-step learning process: (1) training a balanced teacher model using RUS and (2) training an imbalanced student model through response-based KD, utilizing both soft and hard targets. We hypothesize that this method mitigates class imbalance while preserving important information from the original dataset. We evaluated our proposed model against baseline and RUS-only models using five diverse imbalanced datasets from various domains. Performance was assessed using stratified 10-fold cross-validation with ROC-AUC and PR-AUC scores. Explainability stability was measured by the cosine similarity of SHAP values across cross-validation folds. Results demonstrate that our proposed model consistently outperforms both baseline and RUS-only models regarding ROC-AUC and PR-AUC scores across all datasets. Moreover, it exhibits superior explainability stability in the majority of cases, addressing the sampling bias issue associated with traditional resampling methods. This research contributes to the field of machine learning by offering a novel approach that simultaneously improves predictive performance and maintains explainability for imbalanced data classification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.