Abstract
Identifying effective biomarkers to battle complex diseases is an important but challenging task in biomedical research today. Molecular data of complex diseases is increasingly abundant due to the rapid advance of high throughput technologies. However, a great gap remains in identifying the massive molecular data to phenotypic changes, in particular, at a network level, i.e., a novel method for identifying network biomarkers is in pressing need to accurately classify and diagnose diseases from molecular data and shed light on the mechanisms of disease pathogenesis. Rather than seeking differential genes at an individual-molecule level, here we propose a novel method for identifying network biomarkers based on protein-protein interaction affinity (PPIA), which identify the differential interactions at a network level. Specifically, we firstly define PPIAs by estimating the concentrations of protein complexes based on the law of mass action upon gene expression data. Then we select a small and non-redundant group of protein-protein interactions and single proteins according to the PPIAs, that maximizes the discerning ability of cases from controls. This method is mathematically formulated as a linear programming, which can be efficiently solved and guarantees a globally optimal solution. Extensive results on experimental data in breast cancer demonstrate the effectiveness and efficiency of the proposed method for identifying network biomarkers, which not only can accurately distinguish the phenotypes but also provides significant biological insights at a network or pathway level. In addition, our method provides a new way to integrate static protein-protein interaction information with dynamical gene expression data.
Highlights
The rapid advance of high-throughput technologies opens a new way for biomarker identification, which is an important but challenging task in biomedical research
We propose a novel method to estimate the protein-protein interaction affinity (PPIA) from gene expression data based on the law of mass action [12], whose information is further used to identify a set of interactions and gene-nodes as network biomarkers to diagnose diseases [10]
Based on our PPIA + ellipsoidFN optimization model and Random Forest classifier, 3 genes and 6 interactions totally including 14 genes (Supplementary Data Set 1) were identified with leave-one-out cross-validation classification accuracy 96.97% (64/66), while DEG + ellipsoidFN method got 22 genes (Supplementary Data Set 2) with classification accuracy 93.94% (62/66)
Summary
The rapid advance of high-throughput technologies opens a new way for biomarker identification, which is an important but challenging task in biomedical research. A typical method is to interaction data) with gene expression measurements [1,3]. By exploring network information for biomarker identification, recently Zhang et al [6] defined a vector representation in edge space based on the decomposed PCC to find gene pairs as edge biomarkers, which demonstrates the ability and potential of network information. Their results show that many edge biomarkers (i.e., protein or gene pairs) can distinguish normal and disease samples in high accuracy but their differential expressions are not significant. We take a similar method to approximate PPI activity, which can be expressed by the law of mass action
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.