Abstract
BackgroundSeveral gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study.Principal FindingsTo investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction.ConclusionCombining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set.
Highlights
Cancer is a complex disease characterized by a number of genetic and epigenetic abnormalities
It should be emphasized that the main aim of our study is to present a method for explorative analyses and for seeding gene selection algorithms with prior known gene sets, rather than a claimed method for producing optimized gene signatures competing with published counterparts
Cross-platform gene mapping Using the gene mapping procedure described in Methods, we were able to identify and map at least 80% of the genes from each of the originally published gene sets to the Stanford 43k cDNA array
Summary
Cancer is a complex disease characterized by a number of genetic and epigenetic abnormalities. A number of tumor classification algorithms based on gene expression profiles have been proposed, using clinical data or known biological class labels to build predictive models for outcome: e.g. the 70-gene signature MammaPrintH [3], the 76-gene signature of Wang et al [8] and the Genomic Grade Index [4]. Published gene signatures that are predictive of clinical outcome in breast cancer are partly or completely based on different genes. Their predictions are often in good concordance in terms of assigning new patient samples into groups of poor and good outcome [13,14]. Gene sets from eleven previously published gene signatures are included in the study
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.