An Empirical Approach for Avoiding False Discoveries When Applying High-Dimensional Radiomics to Small Datasets

Avishek Chatterjee,Sameh Saif,Anthony Dohan,Vipul Bist,Caroline Reinhold,Jan Seuntjens,Ives R Levesque,Yoshiko Ueno,Martin Vallieres

doi:10.1109/trpms.2018.2880617

Abstract

Purpose: Radiomic studies, where correlations are drawn between patients’ medical image features and patient outcomes, often deal with small datasets. Consequently, results can suffer from lack of replicability and stability. This paper establishes a methodology to assess and reduce the impact of statistical fluctuations that may occur in small datasets. Such fluctuations can lead to false discoveries, particularly when applying feature selection or machine learning (ML) methods commonly used in the radiomics literature. Methods: Two feature selection methods were created, one for choosing single predictive features, and another for obtaining features sets that could be combined in a predictive model. The features were combined using ML tools less affected by overfitting (Naive Bayes, logistic regression, and linear support vector machines). Only three features were allowed to be combined at a time, further limiting overfitting. This methodology was applied to MR images from small datasets in metastatic liver disease (69 samples) and primary uterine adenocarcinoma (93 samples), and the outcomes studied were: desmoplasia (for liver metastases), lymphovascular space invasion (LVSI), cancer staging (FIGO), and tumor grade (for uterine tumors). For outcomes in uterine cancer, the predictive models were tested on independent subsets. Results: With respect to the combined predictive feature approach: for LVSI, a prognostic factor that a human reader cannot detect, the predictive model yielded AUC = 0.87 ± 0.07 and accuracy = 0.84 ± 0.09 in the testing set. For FIGO staging, AUC = 0.81 ± 0.03 and accuracy = 0.79 ± 0.08. For tumor grade, AUC = 0.76 ± 0.05 and accuracy = 0.70 ± 0.08. Conclusion: Despite considering a large set ( $\sim 10^{4}$ ) of texture features, the false discovery avoidance methodology allowed only robust predictive models to be retained. Thus, the stringent false discovery avoidance methods introduced here do not preclude the discovery of promising correlations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Empirical Approach for Avoiding False Discoveries When Applying High-Dimensional Radiomics to Small Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Radiation and Plasma Medical Sciences

Lead the way for us

Journal: IEEE Transactions on Radiation and Plasma Medical Sciences	Publication Date: Mar 1, 2019
Citations: 17

Similar Papers

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study
Abdullateef O Balogun ... Malek A Almomani
Symmetry | VOL. 12
Abdullateef O Balogun, et. al.Abdullateef O Balogun ... Malek A Almomani
09 Jul 2020
Symmetry | VOL. 12

Feature selection methods and predictive models in CT lung cancer radiomics
Gary Ge ... Jie Zhang
Journal of Applied Clinical Medical Physics | VOL. 24
Gary Ge, et. al.Gary Ge ... Jie Zhang
17 Dec 2022
Journal of Applied Clinical Medical Physics | VOL. 24

Genomic Prediction of Wheat Grain Yield Using Machine Learning
Manisha Sanjay Sirsat ... Paula Rodrigues Oblessuc
Agriculture | VOL. 12
Manisha Sanjay Sirsat, et. al.Manisha Sanjay Sirsat ... Paula Rodrigues Oblessuc
06 Sep 2022
Agriculture | VOL. 12

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.
Quazi Abidur Rahman ... Hance Clarke
JMIR Medical Informatics | VOL. 7
Quazi Abidur Rahman, et. al.Quazi Abidur Rahman ... Hance Clarke
20 Nov 2019
JMIR Medical Informatics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Empirical Approach for Avoiding False Discoveries When Applying High-Dimensional Radiomics to Small Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Radiation and Plasma Medical Sciences