Abstract
Lung cancer is the deadliest cancer and the non-small cell lung cancer (NSCLC) contributes to 80-85% of lung cancer cases. Cancer recurrence is defined as the resurgence of cancer despite the surgical resection of the tumor and occurs in more than 30% of NSCLC patients. It occurs due to several genomic factors, incomplete removal of the tumor, resistance to drugs and chemotherapy, and the presence of cancer stem cells. A preoperative assessment of the risk of recurrence can be crucial for clinicians. The aim of this work is to develop machine learning (ML) models to predict recurrence in NSCLC patients with gene expression data. The gene expression data of 130 NSCLC patients were obtained from a public dataset, named NSCLC-Radiogenomics. Monte-Carlo Feature Selection (MCFS), Boruta feature selection and a combination of MCFS and Boruta were used to identify significant genes which are to be used as input features. Supervised ML models were trained with 5-fold cross validation using Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forest (RF) algorithms. Synthetic Minority Oversampling Technique (SMOTE) was used to handle the class-imbalance in the input data. The models trained on SMOTE-applied data outperformed the models trained on original (imbalanced) data. The optimal performance with 5-fold cross validation was obtained by the SVM model with accuracy of 0.99 and MCC of 0.99. The SVM model also achieved an area under receiver operator characteristics curve of 0.98. The models also achieved good performance while validating on the held-out blind dataset. In summary, the ML-based prediction of recurrence in NSCLC patients can aid clinicians in finalizing postoperative treatment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.