Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome

Martin H Van Vliet,Lodewyk F A Wessels,Marc J Van De Vijver,Marcel J T Reinders,Hugo M Horlings

doi:10.1371/journal.pone.0040358

Abstract

Breast cancer outcome can be predicted using models derived from gene expression data or clinical data. Only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. We rigorously compare three different integration strategies (early, intermediate, and late integration) as well as classifiers employing no integration (only one data type) using five classifiers of varying complexity. We perform our analysis on a set of 295 breast cancer samples, for which gene expression data and an extensive set of clinical parameters are available as well as four breast cancer datasets containing 521 samples that we used as independent validation.mOn the 295 samples, a nearest mean classifier employing a logical OR operation (late integration) on clinical and expression classifiers significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when integrating clinical and expression data. Repeating the experiments using the 521 samples from the four independent validation datasets also indicated a significant performance improvement when integrating clinical and gene expression data. Whether integration also improves performances on other datasets (e.g. other tumor types) has not been investigated, but seems worthwhile pursuing. Our work suggests that future models for predicting breast cancer outcome should exploit both data types by employing a late OR or intermediate integration strategy based on nearest mean classifiers.

Highlights

Many predictors of breast cancer outcome have been published
This is a clear indication that there is synergy between the two data types, and that the late OR integration strategy provides a way to exploit the synergy
Integration Results in Higher area under the curve (AUC) Performance In the DLCV procedure, we optimized the number of features by minimizing the eFPFN error

Summary

Introduction

Many predictors of breast cancer outcome have been published. These predictors have been derived from gene expression data, such as the 70-gene (Veer et al [1]), and 76-gene (Wang et al [2]) signatures, or clinical data, such as the Nottingham Prognostic Index (NPI, [3]) and AdjuvantOnline! tools [4]. Stratifications for ER and HER2 have been made using gene expression data rather than clinical data, which could lead to better prognostic value [8]. Most of these studies have employed a set of standard clinical variables, such as ER status, tumor grade, tumor size, etc. Horlings et al (In preparation, [9]) have characterized additional clinical features (e.g. matrix formation, central fibrosis, etc.) for an existing cohort of 295 breast cancer samples [10] By themselves, these additional clinical variables have independent prognostic power. If and how this power can be used to build a better classifier for outcome prediction has not been investigated

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jul 11, 2012
Citations: 69	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Comprehensive analysis integrating both clinicopathological and gene expression data in more than 1,500 samples: Proliferation captured by gene expression grade index appears to be the strongest prognostic factor in breast cancer (BC)
C Sotiriou ... M Delorenzi
Journal of Clinical Oncology | VOL. 24
C Sotiriou, et. al.C Sotiriou ... M Delorenzi
20 Jun 2006
Journal of Clinical Oncology | VOL. 24

Combining gene expression, demographic and clinical data in modeling disease: a case study of bipolar disorder and schizophrenia
Jan Struyf ... Seth Dobrin
BMC Genomics | VOL. 9
Jan Struyf, et. al.Jan Struyf ... Seth Dobrin
01 Jan 2008
BMC Genomics | VOL. 9

Comparative Analysis of Different Label-Free Mass Spectrometry Based Protein Abundance Estimates and Their Correlation with RNA-Seq Gene Expression Data
Kang Ning ... Alexey I Nesvizhskii
Journal of Proteome Research | VOL. 11
Kang Ning, et. al.Kang Ning ... Alexey I Nesvizhskii
29 Feb 2012
Journal of Proteome Research | VOL. 11

A multivariate analysis approach to the integration of proteomic and gene expression data
Ailís Fagan ... Aedín C Culhane
PROTEOMICS | VOL. 7
Ailís Fagan, et. al.Ailís Fagan ... Aedín C Culhane
01 Jun 2007
PROTEOMICS | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE