Abstract

Background: This study is a retrospective study. The purpose of this study is to construct and validate an early warning model of lung cancer through machine learning. Methods: The CDKN2A gene expression profile and clinical information were downloaded from The Cancer Genome Atlas (TCGA) database and divided into a tumor group and a normal group (n = 57). The top 5 somatic mutation-related genes were extracted from 567 somatic mutation data downloaded from TCGA database using random forest algorithm. Cox proportional hazard model and nomogram were constructed combining CDKN2A, 5 somatic mutation-related genes, gender, and smoking index. Patients were divided into high-risk and low-risk groups according to risk score. The predictability of the model in the prognosis of lung cancer was estimated by Kaplan-Meier survival analysis and receiver operating characteristics curve. Results: We constructed a prognostic model consisting of 5 somatic mutation-related genes (sphingosine 1-phosphate receptor 1 [S1PR1], dedicator of cytokinesis 7 [DOCK7], DEAD-box helicase 4 [DDX4], laminin subunit beta 3 [LAMB3], and importin 5 [IPO5]), cyclin-dependent kinase inhibitor 2A (CDKN2A), gender, and smoking indicators. The high-risk group had a lower overall survival rate compared to the low-risk group (hazard ratio = 2.14, P = 0 .0323). The area under the curve predicted for 3-year, 5-year, and 10-year survival rates are 0.609, 0.673, and 0.698, respectively. The accuracy, sensitivity, and specificity of the model for predicting the 10-year survival rate of lung cancer are 76.19%, 56.71%, and 86.23%. Conclusion: The lung cancer early warning model and nomogram may provide an essential reference for patients with lung cancer management in the clinic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call