The effect of data resampling methods in radiomics

Aydin Demircioğlu

doi:10.1038/s41598-024-53491-5

Abstract

Radiomic datasets can be class-imbalanced, for instance, when the prevalence of diseases varies notably, meaning that the number of positive samples is much smaller than that of negative samples. In these cases, the majority class may dominate the model's training and thus negatively affect the model's predictive performance, leading to bias. Therefore, resampling methods are often utilized to class-balance the data. However, several resampling methods exist, and neither their relative predictive performance nor their impact on feature selection has been systematically analyzed. In this study, we aimed to measure the impact of nine resampling methods on radiomic models utilizing a set of fifteen publicly available datasets regarding their predictive performance. Furthermore, we evaluated the agreement and similarity of the set of selected features. Our results show that applying resampling methods did not improve the predictive performance on average. On specific datasets, slight improvements in predictive performance (+ 0.015 in AUC) could be seen. A considerable disagreement on the set of selected features was seen (only 28.7% of features agreed), which strongly impedes feature interpretability. However, selected features are similar when considering their correlation (82.9% of features correlated on average).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Feb 3, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The effect of data resampling methods in radiomics

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Combining Models is More Likely to Give Better Predictions than Single Models.
Xiaoping Hu ... Simon Edwards
Phytopathology® | VOL. 105
Xiaoping Hu, et. al.Xiaoping Hu ... Simon Edwards
28 Aug 2015
Phytopathology® | VOL. 105

Commentary: Reporting standards are needed for evaluations of risk reclassification
M S Pepe ... H Janes
International Journal of Epidemiology | VOL. 40
M S Pepe, et. al.M S Pepe ... H Janes
13 May 2011
International Journal of Epidemiology | VOL. 40

Pursuit of the Ultimate Regression Model for Samarium(III), Europium(III), and LiCl Using Laser-Induced Fluorescence, Design of Experiments, and a Genetic Algorithm for Feature Selection.
Hunter B Andrews ... Samantha K Cary
ACS omega | VOL. 8
Hunter B Andrews, et. al.Hunter B Andrews ... Samantha K Cary
03 Jan 2023
ACS omega | VOL. 8

On the relative value of data resampling approaches for software defect prediction
Kwabena Ebo Bennin ... Akito Monden
Empirical Software Engineering | VOL. 24
Kwabena Ebo Bennin, et. al.Kwabena Ebo Bennin ... Akito Monden
21 Jun 2018
Empirical Software Engineering | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The effect of data resampling methods in radiomics

Abstract

Talk to us

Similar Papers

More From: Scientific Reports