Abstract

BackgroundThe abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. Recently there is a surging interest in combining gene expression with gene networks such as protein-protein interaction (PPI) network, gene co-expression (CE) network and pathway information to identify robust and accurate biomarkers for metastasis prediction, reflecting the common belief that cancer is a systems biology disease. However, controversy exists in the literature regarding whether network markers are indeed better features than genes alone for predicting as well as understanding metastasis. We believe much of the existing results may have been biased by the overly complicated prediction algorithms, unfair evaluation, and lack of rigorous statistics. In this study, we propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution.ResultsExperimental results reveal that edge-based feature types consistently outperformed gene-based feature type in random forest and logistic regression models under all performance evaluation metrics, while the prediction accuracy of edge-based support vector machine (SVM) model was poorer, due to the larger number of edge features compared to gene features and the lack of feature selection in SVM model. Experimental results also show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis.ConclusionsOverall, this study validates the utility of edge features as biomarkers but also highlights the importance of carefully designed experimental procedures in order to achieve statistically reliable comparison results.

Highlights

  • The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis

  • Experimental results show that edge features are much more robust than gene features and the top biomarkers from edge feature types are statistically more significantly enriched in the biological processes that are well known to be related to breast cancer metastasis

  • We present two edge-based feature types based on protein-protein interaction (PPI) and CE networks and tested on the Amsterdam Classification Evaluation Suite (ACES) dataset which includes more than 1600 patients from twelve patient cohorts

Read more

Summary

Introduction

The abundance of molecular profiling of breast cancer tissues entailed active research on molecular marker-based early diagnosis of metastasis. We propose a simple approach to use network edges as features, based on two types of networks respectively, and compared their prediction power using three classification algorithms and rigorous statistical procedure on one of the largest datasets available. To detect biomarkers that are significant for the prediction and to compare the robustness of different feature types, we propose an unbiased and novel procedure to measure feature importance that eliminates the potential bias from factors such as different sample size, number of features, as well as class distribution. Molecular profiling of primary breast cancerous tissues has enabled the development of machine learning models for early prediction of metastasis. The patient being metastasis-free for at least 5 years and metastasis within 5 years are classified as good and poor outcomes respectively

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.