Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Highlights
What kind of evolutionary mechanisms give rise to this kind of distribution of gene or domain family sizes within genomes? In one model by Huynen and van Nimwegen [33], every gene within a gene family will be more or less likely to duplicate, depending on the utility of the function of that gene family within the particular lineage of organisms studied, and they showed that such a model matches the observed power laws
Have the trends described above stood the test of time as more genomes have been sequenced and more domain families have been identified? We considered the 1943 UniProt proteomes covered by version 30.0 of Pfam, plotted the frequency Y of domain families that have precisely X members as a function of X, and fit a power law curve to this
Chothia and Gough [49] performed a similar study on 663 SCOP superfamily domains evaluated at many different thresholds and found that while 516 (78%) superfamilies were common to all three kingdoms at a threshold of 10% of species in each kingdom, only 156 (24%) superfamilies were common to all three kingdoms at a threshold of 90%
Summary
By studying the domain architectures of proteins, we can understand their evolution as a modular phenomenon, with high-level events enabling significant changes to take place in a time span much shorter than required by point mutations only. The conclusions drawn generally consider properties averaged for entire
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have