Abstract
We investigated the performance of multiple radiomics feature extractors/software on predicting epidermal growth factor receptor mutation status in 228 patients with non–small cell lung cancer from publicly available data sets in The Cancer Imaging Archive. The imaging and clinical data were split into training (n = 105) and validation cohorts (n = 123). Two of the most cited open-source feature extractors, IBEX (1563 features) and Pyradiomics (1319 features), and our in-house software, Columbia Image Feature Extractor (CIFE) (1160 features), were used to extract radiomics features. Univariate and multivariate analyses were performed sequentially to predict EGFR mutation status using each individual feature extractor. Our univariate analysis integrated an unsupervised clustering method to identify nonredundant and informative candidate features for the creation of prediction models by multivariate analyses. In training, unsupervised clustering-based univariate analysis identified 5, 6, and 4 features from IBEX, Pyradiomics, and CIFE as candidate features, respectively. Multivariate prediction models using these features from IBEX, Pyradiomics, and CIFE yielded similar areas under the receiver operating characteristic curve of 0.68, 0.67, and 0.69. However, in validation, areas under the receiver operating characteristic curve of multivariate prediction models from IBEX, Pyradiomics, and CIFE decreased to 0.54, 0.56 and 0.64, respectively. Different feature extractors select different radiomics features, which leads to prediction models with varying performance. However, correlation between those selected features from different extractors may indicate these features measure similar imaging phenotypes associated with similar biological characteristics. Overall, attention should be paid to the generalizability of individual radiomics features and radiomics prediction models.
Highlights
Radiomics is a rapidly evolving field aiming to link phenotypes characterized from medical images with clinical data, including but not limited to, diagnostic, prognostic, and genomic information [1,2,3,4,5,6,7]
There are many published prediction models related to both disease and treatment, there is no standardized evaluation of the performance [2], such as, but not limited to, the use of publicly available data and open-source feature extractors
The Cancer Imaging Archive (TCIA) data consisted of 3 shared projects, non–small cell lung cancer (NSCLC)-Radiogenomics [51], The Cancer Genome Atlas (TCGA)-Lung Adenocarcinoma (TCGA-LUAD) [52], and TCGALung Squamous Cell Carcinoma (TCGA-LUSC) [53]
Summary
Radiomics is a rapidly evolving field aiming to link phenotypes characterized from medical images with clinical data, including but not limited to, diagnostic, prognostic, and genomic information [1,2,3,4,5,6,7]. There are many published prediction models related to both disease and treatment, there is no standardized evaluation of the performance [2], such as, but not limited to, the use of publicly available data and open-source feature extractors. The National Institutes of Health has encouraged medical imaging researchers to publicly share their data to stimulate open-science collaboration, and The Cancer Imaging Archive (TCIA) has evolved into a leading public database [34]. TCIA is a service that hosts a large archive of medical images of cancer accessible for public download.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.