Abstract

Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ2, and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.

Highlights

  • Biological systems, such as living cells, are composed of a large number of individual components

  • Statistical Properties of Domain Co-occurrence Networks We analyzed the statistical properties of the domain co-occurrence networks (DCN) of 15 plant species, yeast, and human, and found that they share several common features

  • This log-log plot shows that the number of nodes with a specific degree value mathematically follows a power law distribution, because the logarithmic relationship between two variables approximates a linear relationship

Read more

Summary

Introduction

Biological systems, such as living cells, are composed of a large number of individual components (e.g., proteins, DNA, RNA, and small molecules) These molecules interact and form networks to carry out biological functions. Network biology primarily focuses on metabolic, gene regulatory, and/or protein-protein interaction networks [9,10,11,12,13,14,15,16,17,18,19,20] Since proteins and their interactions play central roles in almost all biological processes, protein interaction networks have been a major target of network biology. Protein interaction networks can be used to identify hub proteins having critical biological functions, to predict biological pathways, and to infer the function of a protein according to its interactions with other proteins with known functions [22,23,24]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.