Abstract

There is a great deal of prior knowledge about gene function and regulation in the form of annotations or prior results that, if directly integrated into individual prognostic or diagnostic studies, could improve predictive performance. For example, in a study to develop a predictive model for cancer survival based on gene expression, effect sizes from previous studies or the grouping of genes based on pathways constitute such prior knowledge. However, this external information is typically only used post-analysis to aid in the interpretation of any findings. We propose a new hierarchical two-level ridge regression model that can integrate external information in the form of "meta features" to predict an outcome. We show that the model can be fit efficiently using cyclic coordinate descent by recasting the problem as a single-level regression model. In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative. We demonstrate our approach with applications to the prediction of chronological age based on methylation features and breast cancer mortality based on gene expression features.

Highlights

  • In genomic studies, there is often a great deal of prior knowledge about the genomic features that are being modeled

  • In a simulation-based evaluation we show that the proposed method outperforms standard ridge regression and competing methods that integrate prior information, in terms of prediction performance when the meta features are informative on the mean of the features, and that there is no loss in performance when the meta features are uninformative

  • We show that the two-level ridge regression can be reformulated into a single-level ridge regression with two tuning parameters, enabling an efficient model coordinate descent fitting algorithm that can handle large numbers of features and meta-features

Read more

Summary

Introduction

There is often a great deal of prior knowledge about the genomic features that are being modeled. The Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) study includes cDNA microarray profiling of close to two thousand breast cancer patients and patients’ survival information within the study follow-up (Curtis et al, 2012). In this example, which we later use to illustrate our approach, we are interested in predicting patient mortality based on their gene expression profiles. As potentially informative meta features we consider the attractor metagenes identified by Cheng et al (2013) These are groups of genes that capture molecular events known to be associated with clinical outcomes in many cancers.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call