Abstract

Whole genome protein-protein association networks are not random and their topological properties stem from genome evolution mechanisms. In fact, more connected, but less clustered proteins are related to genes that, in general, present more paralogs as compared to other genes, indicating frequent previous gene duplication episodes. On the other hand, genes related to conserved biological functions present few or no paralogs and yield proteins that are highly connected and clustered. These general network characteristics must have an evolutionary explanation. Considering data from STRING database, we present here experimental evidence that, more than not being scale free, protein degree distributions of organisms present an increased probability for high degree nodes. Furthermore, based on this experimental evidence, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated with a probability that linearly grows with gene degree and decreases with its clustering coefficient. For the first time a model yields results that simultaneously describe different topological distributions. Also, this model correctly predicts that, to produce protein-protein association networks with number of links and number of nodes in the observed range for Eukaryotes, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. This scenario implies a universal mechanism for genome evolution.

Highlights

  • Genome evolution is determined first by the processes that modify DNA and by those mechanisms that either neutrally keep or naturally select these mutations by their phenotypic effects

  • We propose an adequate ordering for genes to globally illustrate topological properties of the protein-protein association matrix. Considering these conclusions based on the information provided by STRING database we propose a genome evolution dynamics where the probability that a gene duplicates grows with its degree and decreases depending on how clustered it is

  • In this paper we have presented evidence obtained from protein-protein association data that degree distribution is not scale free, presenting an increased probability for high degree nodes, and that there are a few hub nodes in these networks, probably organized in a hierarchical way

Read more

Summary

Introduction

Genome evolution is determined first by the processes that modify DNA and by those mechanisms that either neutrally keep or naturally select these mutations by their phenotypic effects. Barabasi and collaborators [1,2] have described genomes of different organisms as networks where nodes are either genes or proteins, and links correspond to associations between the nodes They proposed an evolution dynamics for the genome considering that genes are sequentially added to a network following a preferential attachment rule: each newly incorporated gene interacts with a gene already on the network with a probability that is proportional to its degree, that is, to the number of other genes with which it already interacts. We propose an adequate ordering for genes to globally illustrate topological properties of the protein-protein association matrix Considering these conclusions based on the information provided by STRING database we propose a genome evolution dynamics where the probability that a gene duplicates grows with its degree and decreases depending on how clustered it is. The results of these simulations are capable of describing different aspects of the network topology, besides predicting the ratio of duplicated and de novo acquired genes

Results
X N X N
Discussion and Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call