Abstract

BackgroundThe increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs (kTSP). Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing.ResultsFor each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In particular, the better predictive results of DV in across platform classification indicate higher robustness of the classifier when trained on single channel data and applied to gene expression ratios.ConclusionsWe present a systematic evaluation of strategies for the integration of independent microarray studies in a classification task. Our findings in across studies classification may guide further research aiming on the construction of more robust and reliable methods for stratification and diagnosis in clinical practice.

Highlights

  • The increasing number of gene expression microarray studies represents an important resource in biomedical research

  • We systematically evaluated the generalization performance of five selected methods support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF), k-top scoring pairs (kTSP) and derived version (DV) on four breast cancer gene expression microarray studies almost comprising 1000 independent samples (Table 1)

  • The challenge was to predict estrogen receptor status and histological grade of a tumor sample in an independent study which was not used for the training

Read more

Summary

Introduction

The increasing number of gene expression microarray studies represents an important resource in biomedical research. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. In breast cancer several prognostic gene signatures have been proposed [1,2,3,4] To this date, one has been approved for clinical diagnosis. Different protocols and technologies hamper such attempts and the translation to clinical practice This affects predictive signatures derived from gene expression microarray data. On the other hand promising classification results for the integration of studies were reported [8] as well as a high level of concordance between several microarraybased and alternative technology platforms measuring gene expression [9]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.