Abstract

BackgroundProtein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understandingof the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.MethodsBased on phenotypic similarities among diseases, we first group diseases into overlapping modules. We then develop a framework to infer associations between domains and diseases through known relationships between diseases and modules, domains and proteins, as well as proteins and disease modules. Different methods including Association, Maximum likelihood estimation (MLE), Domain-disease pair exclusion analysis (DPEA), Bayesian, and Parsimonious explanation (PE) approaches are developed to predict domain-disease associations.ResultsWe demonstrate the effectiveness of all the five approaches via a series of validation experiments, and show the robustness of the MLE, Bayesian and PE approaches to the involved parameters. We also study the effects of disease modularization in inferring novel domain-disease associations. Through validation, the AUC (Area Under the operating characteristic Curve) scores for Bayesian, MLE, DPEA, PE, and Association approaches are 0.86, 0.84, 0.83, 0.83 and 0.79, respectively, indicating the usefulness of these approaches for predicting domain-disease relationships. Finally, we choose the Bayesian approach to infer domains associated with two common diseases, Crohn’s disease and type 2 diabetes.ConclusionsThe Bayesian approach has the best performance for the inference of domain-disease relationships. The predicted landscape between domains and diseases provides a more detailed view about the disease mechanisms.Electronic supplementary materialThe online version of this article (doi:10.1186/s12918-015-0247-y) contains supplementary material, which is available to authorized users.

Highlights

  • Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins

  • Recent developments in human genetics and computational biology made it possible to identify a number of genes that are associated with complex diseases [1]

  • We further demonstrate the effectiveness and robustness of these approaches, through a series of large-scale validation experiments, and discussed about the benefits brought by modularization, while comparing the performances of the five approaches in terms of three evaluation criteria (AUC score, Accuracy, and Mean rank ratio)

Read more

Summary

Introduction

Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Both studies rely on a relatively small set of domain-disease associations compiled by bridging domains that contain known deleterious nsSNPs and human diseases with these nsSNPs [15] To circumvent this problem we seek evidences of domain-disease associations at the gene level, and instead of considering inadequate number of disease mutations in the domains, we resort to highly abundant publicly available gene-disease associations [16,17,18,19,20]. These studies depend on domaindomain interactions that are generally incomplete and contain many false positive and false negative domain interactions [14, 15]. The basic idea is that if a disease is associated with many genes with their corresponding products (proteins) containing common domains, the common domains are more likely to be associated with the disease

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call