A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Tanapol Kosolwattana,Shizhong Han,Hua Chen,Chenang Liu,Renjie Hu,Ying Lin

doi:10.1186/s13040-023-00330-4

Abstract

In many healthcare applications, datasets for classification may be highly imbalanced due to the rare occurrence of target events such as disease onset. The SMOTE (Synthetic Minority Over-sampling Technique) algorithm has been developed as an effective resampling method for imbalanced data classification by oversampling samples from the minority class. However, samples generated by SMOTE may be ambiguous, low-quality and non-separable with the majority class. To enhance the quality of generated samples, we proposed a novel self-inspected adaptive SMOTE (SASMOTE) model that leverages an adaptive nearest neighborhood selection algorithm to identify the “visible” nearest neighbors, which are used to generate samples likely to fall into the minority class. To further enhance the quality of the generated samples, an uncertainty elimination via self-inspection approach is introduced in the proposed SASMOTE model. Its objective is to filter out the generated samples that are highly uncertain and inseparable with the majority class. The effectiveness of the proposed algorithm is compared with existing SMOTE-based algorithms and demonstrated through two real-world case studies in healthcare, including risk gene discovery and fatal congenital heart disease prediction. By generating the higher quality synthetic samples, the proposed algorithm is able to help achieve better prediction performance (in terms of F1 score) on average compared to the other methods, which is promising to enhance the usability of machine learning models on highly imbalanced healthcare data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Apr 25, 2023
Citations: 18	License type: open-access

R Discovery Prime

R Discovery Prime

A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Abstract

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
Kiranmayi Kotipalli ... Shan Suthaharan
-
Kiranmayi Kotipalli, et. al.Kiranmayi Kotipalli ... Shan Suthaharan
13 Oct 2014
13 Oct 2014

Abstention-SMOTE
Cheng Zhang ... Xiaodong Zhao
-
Cheng Zhang, et. al.Cheng Zhang ... Xiaodong Zhao
27 Dec 2017
27 Dec 2017

Highly imbalanced fault classification of wind turbines using data resampling and hybrid ensemble method approach
Subhajit Chatterjee ... Yung-Cheol Byun
Engineering Applications of Artificial Intelligence | VOL. 126
Subhajit Chatterjee, et. al.Subhajit Chatterjee ... Yung-Cheol Byun
12 Sep 2023
Engineering Applications of Artificial Intelligence | VOL. 126

A Rebalancing Framework for Classification of Imbalanced Medical Appointment No-show Data
Ulagapriya Krishnan ... Pushpa Sangar
Journal of Data and Information Science | VOL. 6
Ulagapriya Krishnan, et. al.Ulagapriya Krishnan ... Pushpa Sangar
27 Jan 2021
Journal of Data and Information Science | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Abstract

Talk to us

Similar Papers

More From: BioData Mining