Abstract

635 Background: Colorectal cancer (CRC) remains a leading cause of cancer-related mortality in the United States. A key therapeutic dilemma in the treatment of CRC is whether patients with stage II and stage III disease require adjuvant chemotherapy after surgical resection. Attempts to improve identification of patients at increased risk of recurrence have yielded many predictive models based on gene expression data, but none are FDA approved and none are used in standard clinical practice. To improve recurrence prediction, we utilize a machine learning approach to predict recurrence status at 3 years after diagnosis. Methods: A dataset was curated from six publically available microarray datasets, and multiple views were generated to include information from non-tumor tissue gene expression patterns, gene set structure, protein-protein interaction network structure, previously curated molecular signatures, and identified tumor suppressor/driver mutations. These views were used to train a diverse pool of base learners using 10x 10-fold cross-validation. Stacked generalization was used to train an ensemble model, also known as a meta-learner, from the predictions of these base learners. Results: The performance of microarray trained models was significantly better compared to models trained on clinical data (Paired Wilcoxon signed rank test, p = 1.49 x 10-8), demonstrating that molecular data predicts recurrence significantly better than basic clinical data. Review of the model training performances revealed that non-linear classifiers often outperform linear classifiers, and that ensemble methods can also enhance performance. We also demonstrate the feasibility of the multiple-view multiple learner (MVML) supervised learning framework to generate and integrate predictions across a diverse set of learners, with the performance of the meta-learner exceeding or matching that of the best base learners across all performance metrics. Conclusions: This work represents the first effort to use ensemble learning to predict CRC recurrence and highlights the promise of ensemble learning to improve the performance of predictive models in order to realize the goals of precision medicine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.