Abstract
This study aims at comparing methods for selecting optimal radiomic and gene expression features to develop a radiogenomic phenotype, that will be used to predict overall survival in non-small cell lung cancer (NSCLC) patients. Baseline CT images of 85 NSCLC patients (male/female: 58/27, event: death, adenocarcinoma/squamous cell carcinoma/unspecified: 41/32/12, in stages I/II/III/unspecified: 39/25/12/9) with gene expression profile (microarray data) of 33 genes were used from the NSCLC-Radiomics Genomics dataset, publicly available from the National Cancer Institute’s Cancer Imaging Archive (TCIA). The 33 genes were selected on the basis that they represent three major co-expression patterns (“signatures”) in the dataset. These included the histology, neuroendocrine (NE) and pulmonary surfactant systems (PSS) signature genes. ITKSNAP was used for 3D tumor volume segmentation from CT scans. Radiomic features (n=102) were extracted from the 3D tumor volume using the CaPTk software. The first approach performs the feature selection in two steps: intra-modal feature selection (select features within the radiomic and genomic modalities such that the features are not highly correlated with each other and do not have a skewed distribution, have a positive Mean Decrease in Accuracy (MDA) value and maximize the AUC in the prediction of overall survival) and inter-modal feature selection (select features that are not highly correlated with features from other modalities). The second approach builds upon the standard and widely used Principal Component Analysis but tries to improve on its poor performance for survival analysis by doing consensus clustering to determine the optimal number of feature clusters within the radiomic and genomic modalities. For each of the clusters, the first principal component is calculated and used as the representative feature for that highly correlative cluster. The third approach provides a supervised take on feature selection by fitting a Cox regression with lasso regularization on the radiomic and genomic features to obtain a correlation between the individual features and the overall survival outcome. The features which have the highest correlation with the outcome are selected. Consensus clustering with a 10% cutoff for minimum change in the cumulative distribution function is used to calculate the optimal multi-modal phenotypes from the optimal multi-modal features determined from these three approaches. The multi-modal phenotypes were combined with clinical factors of histology, stage and sex in five-fold cross-validated multivariate Cox proportional hazards models (200 iterations) of overall survival. In addition to the cross-validated cstatistics, we also built a model on the complete dataset, for each of the approaches, to evaluate the Kaplan Meier performance in separating participants above versus the median prognostic score. The first approach gives a survival prediction performance (0.61, [0.55,0.63]) that is comparable to the third approach (0.61, [0.56,0.65]). The second approach results in a model that has a comparably lower prognostic performance (0.54, [0.48,0.60]). All three approaches result in models that improve on the prognostic performance of the model built using only clinical covariates (0.53, [0.50,0.59]). This preliminary study aims to draw comparisons between the various methods used to select optimal features from multi-modal descriptors of tumor regions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.