Abstract

BackgroundIn radiomic studies, several models are often trained with different combinations of feature selection methods and classifiers. The features of the best model are usually considered relevant to the problem, and they represent potential biomarkers. Features selected from statistically similarly performing models are generally not studied. To understand the degree to which the selected features of these statistically similar models differ, 14 publicly available datasets, 8 feature selection methods, and 8 classifiers were used in this retrospective study. For each combination of feature selection and classifier, a model was trained, and its performance was measured with AUC-ROC. The best-performing model was compared to other models using a DeLong test. Models that were statistically similar were compared in terms of their selected features.ResultsApproximately 57% of all models analyzed were statistically similar to the best-performing model. Feature selection methods were, in general, relatively unstable (0.58; range 0.35–0.84). The features selected by different models varied largely (0.19; range 0.02–0.42), although the selected features themselves were highly correlated (0.71; range 0.4–0.92).ConclusionsFeature relevance in radiomics strongly depends on the model used, and statistically similar models will generally identify different features as relevant. Considering features selected by a single model is misleading, and it is often not possible to directly determine whether such features are candidate biomarkers.

Highlights

  • In radiomic studies, several models are often trained with different combinations of feature selection methods and classifiers

  • Radiomics is often performed using a wellestablished machine learning pipeline; generic features from the images are first extracted before feature selection methods and classifiers for modeling are employed [4, 5]

  • In the absence of a direct link between features and the underlying biology of various pathologies, in radiomic studies, many generic features are extracted in the hope that some would be associated with the Demircioğlu Insights into Imaging (2022) 13:28 biology and be predictive [7]

Read more

Summary

Introduction

Several models are often trained with different combinations of feature selection methods and classifiers. Radiomics is often performed using a wellestablished machine learning pipeline; generic features from the images are first extracted before feature selection methods and classifiers for modeling are employed [4, 5]. In the absence of a direct link between features and the underlying biology of various pathologies, in radiomic studies, many generic features are extracted in the hope that some would be associated with the Demircioğlu Insights into Imaging (2022) 13:28 biology and be predictive [7]. It seems natural to consider relevant features of the best-performing model as surrogates for biomarkers because such features contribute to the model’s predictive performance. They should be considered informative and are at least good candidates for biomarkers [10,11,12,13,14]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call