Abstract
A considerable portion of patients with colorectal cancer have a high risk of disease recurrence after surgery. These patients can be identified by analyzing the expression profiles of signature genes in tumors. But there is no consensus on which genes should be used and the performance of specific set of signature genes varies greatly with different datasets, impeding their implementation in the routine clinical application. Instead of using individual genes, here we identified functional multi-gene modules with significant expression changes between recurrent and recurrence-free tumors, used them as the signatures for predicting colorectal cancer recurrence in multiple datasets that were collected independently and profiled on different microarray platforms. The multi-gene modules we identified have a significant enrichment of known genes and biological processes relevant to cancer development, including genes from the chemokine pathway. Most strikingly, they recruited a significant enrichment of somatic mutations found in colorectal cancer. These results confirmed the functional relevance of these modules for colorectal cancer development. Further, these functional modules from different datasets overlapped significantly. Finally, we demonstrated that, leveraging above information of these modules, our module based classifier avoided arbitrary fitting the classifier function and screening the signatures using the training data, and achieved more consistency in prognosis prediction across three independent datasets, which holds even using very small training sets of tumors.
Highlights
Colorectal cancer is one leading cause of cancer mortality
We used two independent datasets of early colorectal cancer patients to verify the two key hypotheses: (1) the most differentially expressed modules are non-randomly associated with tumor recurrence; (2) such modules identified from different datasets will overlap significantly in more genes than random
Instead of constructing a single giant network, we used protein interaction data to build networks for each of these Gene ontology (GO) set of genes and identified multi-genes modules, i.e. groups of genes that are densely connected in network topology and relatively separate from the rest network
Summary
Colorectal cancer is one leading cause of cancer mortality. About 20–30% of patients at stage II and 50% of patients at stage III experience disease recurrence after surgery [1]. The recent studies have suggested the expression profile of multi-gene signatures as a better prognosis predictor for patients with colorectal cancer than traditional methods using clinical or pathological features, and some are entering the market [2,3,4,5,6,7]. These signature genes were typically identified from differentially expressed genes between a training set of tumors from patients with or without disease recurrence. These steps, i.e. the gene selection and classifier construction, are iterated to optimize both choices
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.