Abstract

AimsDespite the promising results achieved by radiomics prognostic models for various clinical applications, multiple challenges still need to be addressed. The two main limitations of radiomics prognostic models include information limitation owing to single imaging modalities and the selection of optimum machine learning and feature selection methods for the considered modality and clinical outcome. In this work, we applied several feature selection and machine learning methods to single-modality positron emission tomography (PET) and computed tomography (CT) and multimodality PET/CT fusion to identify the best combinations for different radiomics modalities towards overall survival prediction in non-small cell lung cancer patients. Materials and methodsA PET/CT dataset from The Cancer Imaging Archive, including subjects from two independent institutions (87 and 95 patients), was used in this study. Each cohort was used once as training and once as a test, followed by averaging of the results. ComBat harmonisation was used to address the centre effect. In our proposed radiomics framework, apart from single-modality PET and CT models, multimodality radiomics models were developed using multilevel (feature and image levels) fusion. Two different methods were considered for the feature-level strategy, including concatenating PET and CT features into a single feature set and alternatively averaging them. For image-level fusion, we used three different fusion methods, namely wavelet fusion, guided filtering-based fusion and latent low-rank representation fusion. In the proposed prognostic modelling framework, combinations of four feature selection and seven machine learning methods were applied to all radiomics modalities (two single and five multimodalities), machine learning hyper-parameters were optimised and finally the models were evaluated in the test cohort with 1000 repetitions via bootstrapping. Feature selection and machine learning methods were selected as popular techniques in the literature, supported by open source software in the public domain and their ability to cope with continuous time-to-event survival data. Multifactor ANOVA was used to carry out variability analysis and the proportion of total variance explained by radiomics modality, feature selection and machine learning methods was calculated by a bias-corrected effect size estimate known as ω2. ResultsOptimum feature selection and machine learning methods differed owing to the applied radiomics modality. However, minimum depth (MD) as feature selection and Lasso and Elastic-Net regularized generalized linear model (glmnet) as machine learning method had the highest average results. Results from the ANOVA test indicated that the variability that each factor (radiomics modality, feature selection and machine learning methods) introduces to the performance of models is case specific, i.e. variances differ regarding different radiomics modalities and fusion strategies. Overall, the greatest proportion of variance was explained by machine learning, except for models in feature-level fusion strategy. ConclusionThe identification of optimal feature selection and machine learning methods is a crucial step in developing sound and accurate radiomics risk models. Furthermore, optimum methods are case specific, differing due to the radiomics modality and fusion strategy used.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call