Effect of machine learning methods on predicting NSCLC overall survival time based on Radiomics analysis

Wenzheng Sun,Mingyan Jiang,Panchun Chang,Fang-Fang Yin,Jun Dang

doi:10.1186/s13014-018-1140-9

Abstract

BackgroundTo investigate the effect of machine learning methods on predicting the Overall Survival (OS) for non-small cell lung cancer based on radiomics features analysis.MethodsA total of 339 radiomic features were extracted from the segmented tumor volumes of pretreatment computed tomography (CT) images. These radiomic features quantify the tumor phenotypic characteristics on the medical images using tumor shape and size, the intensity statistics and the textures. The performance of 5 feature selection methods and 8 machine learning methods were investigated for OS prediction. The predicted performance was evaluated with concordance index between predicted and true OS for the non-small cell lung cancer patients. The survival curves were evaluated by the Kaplan-Meier algorithm and compared by the log-rank tests.ResultsThe gradient boosting linear models based on Cox’s partial likelihood method using the concordance index feature selection method obtained the best performance (Concordance Index: 0.68, 95% Confidence Interval: 0.62~ 0.74).ConclusionsThe preliminary results demonstrated that certain machine learning and radiomics analysis method could predict OS of non-small cell lung cancer accuracy.

Highlights

To investigate the effect of machine learning methods on predicting the Overall Survival (OS) for non-small cell lung cancer based on radiomics features analysis
We investigated the effect of 8 ML and 5 feature selection methods on predicting OS for non-small cell lung cancer based on radiomics analysis
Evaluation methods concordance index (CI) with confidence interval (CFI) based on bootstrapping technique

Summary

Introduction

To investigate the effect of machine learning methods on predicting the Overall Survival (OS) for non-small cell lung cancer based on radiomics features analysis. Radiomics analysis can extract a large number of imaging features quantitatively, which could offer a cost-effective and non-invasive approach for individual medicine [3,4,5]. Several studies have shown the predictive and diagnostic ability of radiomics features in different kinds of cancers using various medical imaging modalities, Machine-learning (ML) can be resumptively defined as the computational methods utilizing data/experience to obtain precise predictions [12]. The ML method can first learn laws from the data and establish accuracy and efficiency prediction model based on these laws automatically. It is crucial to compare the performance of different ML models for clinical biomarkers based on radiomics analysis. Appropriate feature selection methods should be applied first for the high-throughput radiomics features who may cause serious overfitting problems

Methods

Results

Conclusion