Abstract

BackgroundDiscovering robust markers for cancer prognosis based on gene expression data is an important yet challenging problem in translational bioinformatics. By integrating additional information in biological pathways or a protein-protein interaction (PPI) network, we can find better biomarkers that lead to more accurate and reproducible prognostic predictions. In fact, recent studies have shown that, “modular markers,” that integrate multiple genes with potential interactions can improve disease classification and also provide better understanding of the disease mechanisms.ResultsIn this work, we propose a novel algorithm for finding robust and effective subnetwork markers that can accurately predict cancer prognosis. To simultaneously discover multiple synergistic subnetwork markers in a human PPI network, we build on our previous work that uses affinity propagation, an efficient clustering algorithm based on a message-passing scheme. Using affinity propagation, we identify potential subnetwork markers that consist of discriminative genes that display coherent expression patterns and whose protein products are closely located on the PPI network. Furthermore, we incorporate the topological information from the PPI network to evaluate the potential of a given set of proteins to be involved in a functional module. Primarily, we adopt widely made assumptions that densely connected subnetworks may likely be potential functional modules and that proteins that are not directly connected but interact with similar sets of other proteins may share similar functionalities.ConclusionsIncorporating topological attributes based on these assumptions can enhance the prediction of potential subnetwork markers. We evaluate the performance of the proposed subnetwork marker identification method by performing classification experiments using multiple independent breast cancer gene expression datasets and PPI networks. We show that our method leads to the discovery of robust subnetwork markers that can improve cancer classification.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1224-1) contains supplementary material, which is available to authorized users.

Highlights

  • In this work, we focus on one of the problems in translational genomics which is the identification of biomarkers from microarray gene expression data to classify type or state of complex disease

  • Given two large-scale-dataset studies of breast cancer metastasis [1, 2]. Both studies tried to find out what would be the gene markers to look at in order to estimate the risk of cancer metastasis

  • We propose a novel method for incorporating protein-protein interaction (PPI) network topological information to enhance identification of subnetwork markers for predicting cancer prognosis

Read more

Summary

Introduction

We focus on one of the problems in translational genomics which is the identification of biomarkers from microarray gene expression data to classify type or state of complex disease. This problem is generally challenging and practically difficult because it normally involves with: 1) Small sample size of clinical data, 2) Large number of potential markers, and 3) Heterogeneity across patient and samples. Several studies have been working on identifying gene markers which are selected based solely on gene expression data These markers have shown to be useful to build classifiers for disease prediction. Recent studies have shown that, “modular markers,” that integrate multiple genes with potential interactions can improve disease classification and provide better understanding of the disease mechanisms

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call