Abstract

BackgroundTranscriptome data generates massive amounts of information that can be used for characterization and prognosis of patient outcomes for many diseases. The goal of our research is to predict the survival time of lung adenocarcinoma patients and improve the accuracy of classifying the long-survival cohort and short-survival cohort.MethodsWe filtered prognostic features related with survival time of lung adenocarcinoma patients by the method of Relief and predicted whether survival time of the patient is >3 years or not—using eight machine learning algorithms (Support Vector Machines, Random Forests, Logistic Regression, Naïve Bayes, Linear Regression, Support Vector Regression (kernel Poly), Support Vector Regression (kernel Linear), and Ridge Regression). Then the best-performed algorithm was chosen to build a predictive model of survival time of lung adenocarcinoma patients. Further, another dataset was used to verify the stability and suitability of this model. We explored the underlying mechanisms of RNA expression changes with the corresponding DNA mutations and DNA methylation patterns in the 22 selected genetic features.ResultsThe best machine learning algorithm was Naïve Bayes (accuracy=75%, AUC =0.81) using the top 22 genetic features, and this algorithm had the stable and great performance on another dataset as well. The coupled mutation number of the long-survival group (>6 years) was less than the short-survival group (<1 year) in 22 genes (P=0.031).ConclusionsThe expression of gene panel can predict the survival time of lung adenocarcinoma patients using Naïve Bayes. These 22 genes do affect the survival time of lung adenocarcinoma.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call