Abstract

During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms.

Highlights

  • During the course of evolution, the complexity of organisms as measured by the total number of their cells and the number of different cell types has increased greatly

  • Our work suggests that the two basic types of duplication have different relative contributions to proteomes. ‘‘Conservative expansions’’ do not correlate with an increase in the number of different cell types, but enlarge the genome size

  • The domain superfamilies are defined in the Structural Classification of Proteins (SCOP) database [21], and our analysis focuses on the seven well-defined classes a to g, respectively

Read more

Summary

Introduction

During the course of evolution, the complexity of organisms as measured by the total number of their cells and the number of different cell types has increased greatly. The different processes that have produced these increases in biological complexity are of fundamental interest, and the data available from complete genome sequences should allow us to eventually determine their general nature and relative contributions. Particular emphasis was placed on extensions in the repertoire of proteins involved in the regulation of expression and in signal transduction; for a review see Kirschner and Gerhart [7]. From analyses of prokaryote genome sequences, van Nimwegen [8] and Ranea et al [9] have shown that the number of genes in different functional categories scales as a power-law of the total number of genes. High values, ;2, are found for proteins involved in transcription and its regulation and for those involved in signal transduction. Van Nimwegen obtained somewhat similar results from an analysis of the eukaryote genome sequences available at the time he carried out that work [8]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call