Abstract

Cancer classification based on molecular level is a relatively routine research procedure with advances in high-throughput molecular profiling techniques. However, the number of genes typically far exceeds the number of the sample size in gene expression studies. The existing gene selection methods are almost based on statistics and machine learning, overlooking relevant biological principles or knowledge while working with biological data. Here, we propose a robust ensemble learning paradigm, which incorporates multiple pathways information, to predict cancer classification. We compare the proposed method with other methods, such as Elastic SCAD and PPDMF, and estimate the classification performance. The results show that the proposed method has the higher performances on most metrics and robust performance. We further investigate the biological mechanism of the ensemble feature genes. The results demonstrate that the ensemble feature genes are associated with drug targets/clinically-relevant cancer. In addition, some core biological pathways and biological process underlying clinically-relevant phenotypes are identified by function annotation. Overall, our research can provide a new perspective for the further study of molecular activities and manifestations of cancer.

Highlights

  • For the patient to receive appropriate therapy, accurate classification of cancer is crucial in disease treatment[1, 2]

  • We evaluated the performance of the proposed ensemble method through five measures: accuracy, precision (Positive Predictive Value), sensitivity (True Positive Rate), specificity and F-score which are calculated below: Figure 3

  • Comparing the other two methods over dataset GSE25066, the results show that the proposed method has the higher performances and performed well on all metrics, with average accuracy of 68.81% compared with 64.62% in the Elastic smoothly clipped absolute deviation penalty (SCAD) and 65.20% in the PPDMF approach, and so on

Read more

Summary

Introduction

For the patient to receive appropriate therapy, accurate classification of cancer is crucial in disease treatment[1, 2]. In Cai et al.’ work[6] the authors performed ensemble-based feature extraction method, which incorporates Multi-category Receiver Operating Characteristic (Multi-ROC), Random Forests (RFs) as well as Maximum Relevance and Minimum Redundancy (mRMR) methods, to select molecular signatures. The above-mentioned gene selection methods are based on statistics and machine learning, seldom do these methods involve relevant biological principles or knowledge while working with biological data. Huang et al.[17] developed a personalized pathway-based diagnostic modeling framework(abbreviated as PPDMF) which converts omics-level features to pathway-level features using the non-parametric principle curve approach and subjects them to feature selection and machine learning classifications for differentiating different phenotypes. We select differentially expressed genes of each pathway to generate a group of base learners through training SVM, we rank all DE pathways with classification accuracy on training set. Experimental results on different data sets in this paper indicate that our proposed method is very promising and robust

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.