Abstract

Predicting the outcome after a cancer diagnosis is critical. Advances in high-throughput sequencing technologies provide physicians with vast amounts of data, yet prognostication remains challenging because the data are greatly dimensional and complex. We evaluated Wnt/β-catenin, carbohydrate metabolism, and PI3K-Akt signaling pathway-related genes as predictive features for classifying tumors and normal samples. Using differentially expressed genes as controls, these pathway-related genes were assessed for accuracy using support-vector machines and three other recommended machine learning models, namely, the random forest, decision tree, and k-nearest neighbor algorithms. The first two outperformed the others. All candidate pathway-related genes yielded areas under the curve exceeding 95.00% for cancer outcomes, and they were most accurate in predicting colorectal cancer. These results suggest that these pathway-related genes are useful and accurate biomarkers for understanding the mechanisms behind cancer development.

Highlights

  • Cancer, associated with high mortality, is a serious threat to public health

  • Data preprocessing is a crucial step in machine learning (ML), and errors at this stage can lead to misleading prediction results. is study included the following preprocessing steps: Data were normalized for each sample by first transforming the data using the log ratio base 2 and for each probe, calculating the median of the logsummarized values from all samples and subtracting it from each sample

  • We demonstrated that three cancer-related pathways have high predictive accuracy compared with differentially expressed genes (DEGs) for cancer

Read more

Summary

Introduction

Cancer, associated with high mortality, is a serious threat to public health. One cause for the high mortality rate is nonspecific symptoms in the early stages, resulting in a poor prognosis and a high fatality rate. us, accurately predicting cancer is a most critical and urgent task for physicians. Because cancer is fundamentally caused by gene malfunction, utilizing their expression levels as relatively direct methods of diagnoses has attracted a great deal of research attention. Analyses of gene expression level data have greatly benefited cancer diagnoses and treatments [1,2,3]. The high dimensionality and noise associated with the data can make these analyses and applications challenging. To reduce these challenges, data are initially processed to identify a small subset of genes primarily responsible for the disease [4, 5]. Feature selection is reportedly a very effective method for reducing the high dimensionality of gene expression datasets [6]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.