Abstract
There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of "meta features" to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.
Highlights
In genomic studies, there is often a great deal of prior knowledge about the genomic features that are being modeled
In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative
We show that the two-level ridge regression can be reformulated into a single-level ridge regression with two tuning parameters, enabling an efficient model coordinate descent fitting algorithm that can handle large numbers of features and meta-features
Summary
There is often a great deal of prior knowledge about the genomic features that are being modeled. The Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) study includes cDNA microarray profiling of close to two thousand breast cancer patients and patients’ survival information within the study follow-up (Curtis et al, 2012). In this example, which we later use to illustrate our approach, we are interested in predicting patient mortality based on their gene expression profiles. As potentially informative meta features we consider the attractor metagenes identified by Cheng et al (2013) These are groups of genes that capture molecular events known to be associated with clinical outcomes in many cancers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.