Abstract

This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.

Highlights

  • What kind of evolutionary mechanisms give rise to this kind of distribution of gene or domain family sizes within genomes? In one model by Huynen and van Nimwegen [33], every gene within a gene family will be more or less likely to duplicate, depending on the utility of the function of that gene family within the particular lineage of organisms studied, and they showed that such a model matches the observed power laws

  • Have the trends described above stood the test of time as more genomes have been sequenced and more domain families have been identified? We considered the 1943 UniProt proteomes covered by version 30.0 of Pfam, plotted the frequency Y of domain families that have precisely X members as a function of X, and fit a power law curve to this

  • Chothia and Gough [49] performed a similar study on 663 SCOP superfamily domains evaluated at many different thresholds and found that while 516 (78%) superfamilies were common to all three kingdoms at a threshold of 10% of species in each kingdom, only 156 (24%) superfamilies were common to all three kingdoms at a threshold of 90%

Read more

Summary

Overview

By studying the domain architectures of proteins, we can understand their evolution as a modular phenomenon, with high-level events enabling significant changes to take place in a time span much shorter than required by point mutations only. The conclusions drawn generally consider properties averaged for entire

Protein Domains
Mechanisms for Domain Architecture Change
Distribution of the Sizes of Domain Families
Kingdom and Age Distribution of Domain Families and Architectures
Domain Co-occurrence Networks
Supra-domains and Conserved Domain Order
Principles of Domain Architecture Evolution
Inferring Ancestral Domain Architectures
Polyphyletic Domain Architecture Evolution
10 Conclusions
11 Materials and Methods
Findings
13 Domain Architecture Analysis Software
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call