Ensemble disease gene prediction by clinical sample-based networks

Ping Luo,Qianghua Xiao,Bolin Chen,Li-Ping Tian,Fang-Xiang Wu

doi:10.1186/s12859-020-3346-8

Abstract

BackgroundDisease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different.ResultsTo address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer’s disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes.ConclusionsIn this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.

Highlights

Disease gene prediction is a critical and challenging task
On the one hand, interacting proteins usually have similar functions, which means algorithms can predict new disease genes based on their relationships with known disease genes in the protein-protein interaction (PPI) network
A single sample-based network is constructed for each case sample by combining clinical samples and the universal static PPI network. (c)

Summary

Introduction

Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Results: To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). Logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. Subsequent de novo validations demonstrate the ability of EdgCSN in predicting new disease genes. Conclusions: In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Disease gene prediction is a critical yet challenging task. On the one hand, interacting proteins (genes) usually have similar functions, which means algorithms can predict new disease genes based on their relationships with known disease genes in the PPI network. Due to the network property of PPIs, most network analysis algorithms can

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 1, 2020
Citations: 8	License type: open-access

R Discovery Prime

R Discovery Prime

Ensemble disease gene prediction by clinical sample-based networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Predicting Disease Genes from Clinical Single Sample-Based PPI Networks
Ping Luo ... Li-Ping Tian
-
Ping Luo, et. al.Ping Luo ... Li-Ping Tian
01 Jan 2018
01 Jan 2018

A algorithm for identifying disease genes by incorporating the subcellular localization information into the protein-protein interaction networks
Xiwei Tang ... Xiaohua Hu
-
Xiwei Tang, et. al. Xiwei Tang ... Xiaohua Hu
01 Dec 2016
01 Dec 2016

Interpretable deep learning translation of GWAS and multi-omics findings to identify pathobiology and drug repurposing in Alzheimer's disease.
Jielin Xu ... Yadi Zhou
Cell reports | VOL. 41
Jielin Xu, et. al.Jielin Xu ... Yadi Zhou
01 Nov 2022
Cell reports | VOL. 41

Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network
Limei Lin ... Jing Zhao
BMC Systems Biology | VOL. 11
Limei Lin, et. al.Limei Lin ... Jing Zhao
01 Dec 2017
BMC Systems Biology | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble disease gene prediction by clinical sample-based networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics